Differences Between Autistic and Non-autistic Adults in the Recognition of Anger From Dynamic Expressions Remain After Controlling for Alexithymia.

A burgeoning literature suggests that alexithymia, and not autism, is responsible for the diculties with static emotion recognition that are documented in the autistic population. Here we investigate whether alexithymia can also account for diculties with dynamic facial expressions. Autistic and control adults (N=60) matched on age, gender, non-verbal reasoning ability and alexithymia, completed an emotion recognition task, which employed dynamic point light displays of emotional facial expressions that varied in speed and spatial exaggeration. The ASD group exhibited signicantly lower recognition accuracy for angry, but not happy or sad, expressions with normal speed and spatial exaggeration. The level of autistic, and not alexithymic, traits was a signicant predictor of accuracy for angry expressions with normal speed and spatial exaggeration. These results suggest that, whilst controls improved in their accuracy for angry PLF stimuli across each level of increasing for autistic participants, only the most extreme (K3) level of the manipulation resulted in an accuracy unpacking this interaction revealed that autistic, relative to control, adults showed reduced recognition of angry expressions at the normal (100%) spatial (S2) and speed (K2) level. Furthermore, whilst control participants improved in accuracy across all kinematic levels, autistic participants only benetted from the speed increase from the normal (100%) to increased (150%) speed level. In addition, multiple regression analyses revealed that autistic traits and NVR, but not age, gender or alexithymia, were signicant predictors of recognition accuracy for angry videos at the normal spatial and speed level (where autistic traits were a negative predictor and NVR was a positive predictor). O results demonstrate that when autistic and control individuals are matched in terms of alexithymia there are group differences in recognition accuracy, though these are restricted to angry (not happy or sad) expressions.


Introduction
Autism spectrum disorder (ASD) is a neurodevelopmental disorder, characterized by di culties in social communication, and restricted and repetitive interests (American Psychiatric Association, 2013). Since the ability to infer emotion from facial expressions is important for social interaction, emotion recognition has long been suspected as a di culty in ASD (Hobson, 1986). However, whilst many studies suggest a disparity in the facial emotion recognition ability of autistic[1] and neurotypical individuals (Ashwin, Chapman, Colle, & Baron-Cohen, 2006;Dziobek, Bahnemann, Convit, & Heekeren, 2010;Lindner & Rosén, 2006;Philip et al., 2010), there have been inconsistent ndings, ranging from no differences between these individuals to large disparities (see Harms et al., 2010, Keating & Cook, 2020, Uljarevic & Hamilton, 2013 for reviews) . Consequently, the question of whether autistic individuals exhibit atypical facial emotion recognition has been debated for over 30 years.
Furthermore, when groups are matched in terms of alexithymia, autistic and neurotypical adults perform comparably with respect to the recognition of emotion . Similarly, Milosavljevic et al., (2016) demonstrated lower emotion recognition scores -again from static face images -for autistic adolescents high in alexithymia relative to those low in alexithymia. Consequently,  propose 'the alexithymia hypothesis': autistic individuals' di culties in emotion-processing, including facial emotion recognition, are caused by co-occurring alexithymia not ASD.
To date, the majority of studies that have tested "the alexithymia hypothesis" have focused on the recognition of emotion from static face images and have thus overlooked the inherently dynamic nature of facial expressions (Kilts, Egan, Gideon, Ely, & Hoffman, 2003;Sato, Kochiyama, Yoshikawa, Naito, & Matsumura, 2004). Importantly, dynamic faces carry both spatial information about the con guration of facial features relative to each other and information about the kinematics (e.g. speed) of movement of facial features (Dobs, Bülthoff, & Schultz, 2018). Recent developments in the face processing literature emphasize the importance of both kinematic and spatial cues in neurotypical facial emotion recognition. Most notably, Sowden and colleagues (in press.) manipulated point-light face (PLF) stimuli (a series of white dots on a black background that convey biological motion and eliminate contrast, texture, colour and luminance cues) such that expressions of happiness, anger and sadness were reproduced at 50%, 100% and 150% of their normal speed, and at 50%, 100% and 150% of their normal range of spatial movement (e.g. at the 150% level a smile would be 50% bigger / more exaggerated than normal). Sowden and colleagues (in press.) found that the emotion recognition accuracy of neurotypical participants was modulated as a function of both spatial and kinematic manipulation. Speci cally, when expressions were reduced in their speed and spatial extent (i.e. at the 50% level), participants were less accurate in their labelling of angry and happy expressions and more accurate for sad expressions. Conversely, when expressions were played with exaggerated spatial movement and greater speed (i.e. at the 150% level), participants displayed higher accuracy for angry and happy expressions and lower accuracy for sad expressions (Sowden et al., in press.). Thus, accuracy for labelling high arousal emotions (happy and angry) is improved when the stimulus is faster and more spatially exaggerated, whereas labelling of low arousal emotions (sad) is impaired. Recent literature therefore highlights that, for neurotypical individuals, both spatial and kinematic facial cues contribute to emotion recognition accuracy.
Since dynamic information is particularly important in real life processing of facial expressions (Krumhuber, Kappas, & Manstead, 2013), if the alexithymia hypothesis is to explain functional, everyday, challenges, it must be the case that the co-occurrence of alexithymia can account for recognition di culties with respect to both spatial and kinematic aspects of facial emotional expressions. That is, when autistic and non-autistic groups are matched in terms of alexithymia there should be no differences between the groups in the processing of either spatial or kinematic cues with respect to emotion recognition. Facial expression recognition from static stimuli relies on the processing of spatial cues but overlooks the contribution of kinematic cues. To the best of our knowledge, there are no studies that have investigated autistic versus neurotypical recognition of emotion from dynamic face stimuli whilst controlling for the in uence of alexithymia. There are, however, some studies that have compared autistic and neurotypical processing of dynamic facial expressions without controlling for alexithymia. For example, Sato and colleagues (Sato, Uono, & Toichi, 2013) demonstrated that for neurotypical adults reducing the speed of movement of facial morph stimuli[2] reduced naturalness ratings, however, for autistic adults the effect of speed on naturalness ratings was signi cantly weaker. Sato and colleagues' results thus demonstrate differences, between autistic and non-autistic adults, in the effects of manipulating facial kinematics. To the best of our knowledge, only one study has examined the contribution of autistic and alexithymic traits to dynamic emotion recognition (Ola & Gullon-Scott, 2020).
The ndings of this study support the alexithymia hypothesis: high alexithymic, but not autistic, traits were associated with less accurate facial expression recognition (Ola & Gullon-Scott, 2020). However, this study, conducted by Ola and Gullon-Scott, has two important limitations. First, only female participants were recruited. Since autistic males comprise three quarters of the ASD population (Loomes, Hull, & Mandy, 2017), and likely differ in behavioural phenotype (Ketelaars, In'T Velt, Mol, Swaab, & Van Rijn, 2016;Rivet & Matson, 2011), one must be cautious about extrapolating the ndings to autistic males.
Second, Ola and Gullon-Scott did not recruit a non-autistic control group. Consequently, the authors were not able to explore whether autistic versus non-autistic group differences in dynamic emotion recognition remain after controlling for alexithymia. That is, although Ola and Gullon-Scott were able to show that some di culties with emotion recognition from dynamic stimuli were associated with alexithymia, one cannot conclude from this study that there are no differences with respect to emotion recognition from dynamic stimuli that are speci cally associated with ASD.
The current study employed the paradigm developed by Sowden and colleagues (Sowden et al., in press.) to test the hypothesis that when autistic and non-autistic groups are matched in terms of alexithymia there will be no differences, between the groups, in the processing of either spatial or kinematic cues with respect to emotion recognition. More speci cally, male and female autistic adults and neurotypical controls rated the emotion expressed by PLF stimuli that had been manipulated such that expressions of happiness, anger and sadness were reproduced at 50%, 100% and 150% of their normal speed and spatial extent. The groups were matched in terms of their scores on a self-report measure of alexithymia.
We predicted that emotion recognition accuracy would be affected by both kinematic and spatial manipulation and that these effects would not interact with group, but rather that Bayesian statistics would provide support for the null hypothesis that the groups perform comparably. We further predicted that the effects of spatial and kinematic manipulation on emotion recognition accuracy would covary with scores on the self-report alexithymia measure. Footnote: [1] 'Disability-rst' terminology is used throughout in line with the majority preference expressed in a survey of the autistic community (Kenny et al., 2016) [2] Facial morph stimuli were constructed by successively presenting 26 images from a neutral (0%) to full emotional (100%) expression with an increase of 4% in emotion from one image to the next. By presenting the images in this way, it gave the illusion of a dynamic emotional expression. The speed of playback was then manipulated to allow the researchers to test their hypotheses.

Method Participants
The chosen sample size is based on an a priori power analysis conducted using GLIMMPSE (Kreidler et al., 2013), which focused on replicating the primary results in the control group (the emotion*spatial and emotion*kinematic interactions). Using data from Sowden et al., (in press), 8 participants are required in the control group in order to have 95% power to detect an effect size of 0.70 (η P 2 ) at alpha level 0.01 for the emotion*spatial interaction. Moreover, 11 participants are required in the control group in order to have 95% power to detect an effect size of 0.53 (η P 2 ) for the emotion* kinematic interaction at alpha level 0.01. However, Button et al., (2013) argue that effect size estimates are commonly in ated ("the winners curse"), and that there is "a common misconception that a replication study will have su cient power to replicate an initial nding if the sample size is similar to that in the original study". Accordingly, we planned to recruit a larger number of participants (N=30 per group; almost triple the largest sample size generated in our power calculations), in order to obtain adequate power. We pre-registered this sample size via the Open Science Framework (https://osf.io/kpefz).  Table 1). The level of autistic characteristics of those in the ASD group was assessed using the Autism Diagnostic Observation Schedule (version 2 (ADOS-2; Lord et al., 2012). The mean total ADOS-2 score in the ASD group was 10.32 (see Supplementary Information B for information on the quantity of participants that met criteria for diagnosis). The MaRs-IB was used to match participants on the basis that the PLF task relies on non-verbal reasoning ability and, with respect to participant matching, task speci c measures of intelligence/ability have been argued to be more appropriate than general measures (Mottron, 2004). A total of four participants (three in the ASD group and one in the control group) had AQ or TAS-20 scores over two standard deviations from their group mean. Since the general pattern of results was unaffected by their removal, these participants are included in the nal analysis. Twenty-two of the 31 ASD participants were recruited via an existing autism research database kept by the Birmingham Psychology Autism Research Team (B-PART). The control and remaining nine ASD participants were recruited via social media (Facebook and Twitter) and Proli c -an online recruitment platform. All participants in the ASD group had previously received a clinical diagnosis of ASD from a quali ed clinician.

Autistic traits
The autistic traits of all ASD and control participants were assessed via the 50-item Autism Quotient (Baron-Cohen et al., 2001). This self-report questionnaire is scored on a range from 0 to 50, with higher scores representing higher levels of autistic characteristics. The AQ assesses ve different domains relevant for ASD traits (attention switching, attention to detail, communication, social skill and imagination). The AQ has been widely used in both the general and the autistic population (Ruzich, Emily Allison, Carrie Smith & Watson, 2015;Ruzich et al., 2016), and has strong psychometric properties, including internal consistency (α ≥ 0.7) and test-retest reliability (r ≥ 0.8; Stevenson & Hart, 2017) Alexithymia Alexithymia was measured via the 20-item Toronto Alexithymia Scale (Bagby et al., 1994). The TAS-20 comprises 20 items rated on a ve-point Likert scale (ranging from 1, strongly disagree, to 5, strongly agree). Total scores on the TAS-20 can range from 20 to 100, with higher scores indicating higher levels of alexithymia. The TAS-20 is the most popular self-report tool for alexithymia and boasts good internal consistency (α ≥ 0.7) and test-retest reliability (r ≥ 0.7) (Bagby et al., 1994;Taylor, Bagby, & Parker, 2003).

Non-verbal reasoning
Non-verbal reasoning was assessed via the Matrix Reasoning Item bank (MaRs-IB; Chierchia et al., 2019).
Each item in the MaRs-IB consists of a 3 x 3 matrix. Eight of the nine available cells in the matrix are lled with abstract shapes, and one cell in the bottom right-hand corner is left empty. Participants are required to complete the matrix by selecting the missing shape from four possible options. In order to correctly identify the missing shape, participants have to deduce relationships between the shapes in the matrix (which vary in shape, colour, size and position). When participants select an answer, they move on to the next item. If participants do not provide a response within 30 seconds, they continue to the next item without a response. The MaRs-IB assessment lasts 8 minutes regardless of how many trials are completed. There is a total of 80 different items in the MaRs-IB, however participants are not required (or expected) to complete all 80 items within the 8 minutes. If a participant completed all 80 items within 8 minutes, the items were presented again but the responses to these were not analysed (following the procedure established by Chierchia and Fuhrmann et al., 2019). The MaRs-IB has been shown to have acceptable internal consistency (Kuder-Richardson 20 ≥ 0.7) and test-retest reliability (r ≥ 0.7; Chierchia et al., 2019).

Procedure
Following a pre-registered design (see https://osf.io/kpefz), participants rst completed the questionnaires (demographics followed by AQ, followed by TAS-20) and then moved on to the PLF task. Each trial in this task (see Fig. 1) began with the presentation of a stimulus, which comprised a silent PLF video of an actor expressing one of 3 emotions whilst saying a sentence at one of the 3 spatial and 3 kinematic levels. After watching the video, participants were asked to rate how angry, happy and sad the person was feeling. Participants made their ratings on a visual analogue scale, with one end representing 'Not at all angry/happy/sad' and the opposite end representing 'Very angry/happy/sad'. Individuals were asked to make ratings for all three target emotions (angry, happy and sad) on scales, which were presented on screen in a random order, after each PLF video. Each trial took approximately 25 seconds to complete. Participants completed 3 practice trials (at the S2 and K2 level) and then 108 randomly ordered experimental trials (12 per condition) across three blocks. Participants were invited to take a break between blocks.
Following PLF task completion participants completed the Matrix Reasoning Item Bank (MaRs-IB; Chierchia et al., 2019).
Participants completed all tasks online using Google Chrome or Mozilla Firefox on a computer or laptop. The frame rate (in frames per second; FPS) of their devices was measured to ensure that the quality/ uidity of the stimulus videos was not degraded. All participants' frame rates were 60 FPS or higher with one exception at 50 FPS. When we ran all analyses with and without the 50 FPS participant, treating them as a potential outlier, the pattern of results was unaffected. Therefore, this participant was included in all analyses.

Statistical Analysis
The three emotion rating responses for each trial were transformed into scores from 0 to 10 (with 0 representing a response of 'Not at all' and 10 representing 'Very') to 3 decimal places. Emotion recognition accuracy scores were calculated as the correct emotion rating minus the mean of the two incorrect emotion ratings [3]. For instance, for a trial in which an angry PLF was presented, the mean ratings of the two incorrect emotions (happy and sad) were subtracted from the correct emotion (angry).
To test our rst hypothesis, we submitted these accuracy scores to a 2 x 3 x 3 x 3 Analysis of Variance (ANOVA) with the between-subjects factor group (ASD, control) and the within-subjects factors emotion (happy, angry, sad), stimulus spatial level (S1, S2, S3), and stimulus kinematic level (K1, K2, K3). To test our second hypothesis, we applied a sqrt transformation to all ordinal factors of interest (age, NVR, AQ, TAS-20), computed z-scores for the transformed data, and submitted the transformed z-scored data, along with the nominal predictor gender, to multiple regression analyses. The effect of the spatial manipulation (de ned as the difference in accuracy between S3 and S1), the effect of the kinematic manipulation (de ned as the difference in accuracy between K3 and K1), mean recognition accuracy and nally accuracy for angry videos at the normal level (S2, K2) were used as the DVs for each of these analyses. For all analyses, we used a p = .05 signi cance threshold to determine whether to accept or reject the null hypothesis. The frequentist approach was supplemented with the calculation of Bayes Factors, which quantify the relative evidence for one theory or model over another. For all Bayesian analyses, we followed the classi cation scheme used in JASP (Lee & Wagenmakers, 2014) to classify the strength of evidence given by Bayes factors, with BF 10 between one and three considered as weak evidence, between three and ten as moderate evidence and greater than ten as strong evidence for the alternative hypothesis respectively. Footnote: [3] Many of the studies that have investigated the emotion recognition ability of autistic individuals have used forced-choice paradigms in which there is a binary (correct; 1, or incorrect; 0) accuracy score for each trial. In order to facilitate comparison of our results to those studies, we also completed a binary accuracy analysis, which yielded similar results (see Supplementary Information C). In this analysis, for each trial, participants scored 1 when they gave the highest rating to the correct emotion, and 0 when they rated either of the incorrect emotions higher than the correct emotion.

Results
Our primary hypothesis was that emotion recognition accuracy would be affected by both kinematic and spatial manipulation and that these effects would not interact with group. To test this hypothesis we conducted a mixed 2 x 3 x 3 x 3 ANOVA with the between-subjects factor group (ASD, control) and the within-subjects factors emotion (happy, angry, sad), stimulus spatial level (S1, S2, S3), and stimulus kinematic level (K1, K2, K3)   post-hoc independent sample t-tests revealed that control, relative to ASD, participants had higher accuracy for angry videos at the 100% spatial (S2) and speed (K2) level [t(58) = 2.78, p bonf. < .05, mean difference = 1.48, BF 10 = 6.09; Fig. 3 Furthermore, the groups did not signi cantly differ at K1 (F(1,58) = .18, p > .05) or K3 (F(1,58) = 3.53 p > .05) but at K2, controls out-performed autistic participants (F(1,58) = 7.75, p < 0.01, η P 2 = .12). These results suggest that, whilst controls improved in their accuracy for angry PLF stimuli across each level of increasing kinematic manipulation, for autistic participants, only the most extreme (K3) level of the kinematic manipulation resulted in an accuracy boost.

Multiple Regression Analyses
Our second hypothesis was that variation in emotion recognition accuracy would covary, not with ASD symptomatology but with scores on the self-report alexithymia scale (TAS-20). To test whether autistic or alexithymic traits were predictive of the effect of the spatial and kinematic manipulations, we conducted two multiple regression analyses. For the rst analysis, we used the effect of spatial manipulation (de ned as the difference in accuracy between S3 and S1) as the dependent variable ( To explore the possibility that only extreme scores on the TAS-20 predict performance, we compared mean accuracy for alexithymic (i.e. TAS-20 ≥ 61) and non-alexithymic (i.e. TAS-20 ≤ 51) participants (according to the cutoff scores outlined by Bagby, Taylor and Parker; 31), excluding 'possibly alexithymic' individuals. An independent samples t-test con rmed that there was no signi cant difference in mean accuracy between these groups [t(48) = -.18, p = .861, mean difference = -.05, BF 10 = 0.29].
Finally, building on our previous observation that the ASD and control groups differed in accuracy for angry videos at the normal (100%) spatial and speed level we conducted a multiple regression analysis to identify the extent to which autistic and alexithymic traits were predictive of accuracy for angry videos at the S2 and K2 levels. This analysis revealed that autistic [standardized β = -.44, t (57)  In order to ensure that AQ is not just a signi cant predictor of accuracy for angry expressions at the normal spatial and speed level due to variation across other co-variables (e.g. age, gender, and non-verbal reasoning), we completed an additional three-step forced entry hierarchical regression analysis following the procedures of Cook et al., (2013  Footnote: [4] See Supplementary Information E for a juxtaposition of the current data against data published by Sowden et al. (in press).

Discussion
The current study tested whether alexithymia accounts for recognition di culties with respect to both spatial and kinematic aspects of facial emotional expression recognition in ASD. We hypothesized that the effects of spatial and kinematic manipulation on emotion recognition accuracy would covary with scores on a self-report alexithymia measure and would not be explained by group (ASD versus control) membership. In replication of Sowden et al., (2013), our results indicated that emotion recognition accuracy was affected by both spatial and kinematic manipulation. However, in con ict with our hypotheses, emotion recognition accuracy did not covary with alexithymia scores. Rather, we observed a signi cant emotion x spatial x kinematic x group interaction. Further unpacking this interaction revealed that autistic, relative to control, adults showed reduced recognition of angry expressions at the normal (100%) spatial (S2) and speed (K2) level. Furthermore, whilst control participants improved in accuracy across all kinematic levels, autistic participants only bene tted from the speed increase from the normal (100%) to increased (150%) speed level. In addition, multiple regression analyses revealed that autistic traits and NVR, but not age, gender or alexithymia, were signi cant predictors of recognition accuracy for angry videos at the normal spatial and speed level (where autistic traits were a negative predictor and NVR was a positive predictor). O results demonstrate that when autistic and control individuals are matched in terms of alexithymia there are group differences in recognition accuracy, though these are restricted to angry (not happy or sad) expressions.
Establishing what can and cannot be explained by the alexithymia hypothesis is of major importance not only to academics working in the eld but also to clinicians for whom it is important to understand which aspects of behaviour and cognition are indicative of autism, and which are more representative of alexithymia. If the alexithymia hypothesis is to explain functional, everyday, challenges with expression recognition it must be the case that alexithymia can account for recognition di culties with respect to both spatial and kinematic aspects of facial emotional expressions. Here we suggest that this is not the case. Self-reported alexithymia was not predictive of the effect of spatial or kinematic manipulation on emotion recognition emotion recognition accuracy in general, or emotion recognition accuracy speci cally relating to angry videos at the normal spatial and speed level. Furthermore, mean accuracy was comparable for alexithymic (i.e. TAS-20 ≥ 61) and non-alexithymic individuals (i.e. TAS-20 ≤ 51). Importantly, the mean alexithymia scores in our study (Control mean= 55.66; ASD mean= 59.74) are comparable to that of Cook et al. (2013;Control mean= 46.9; ASD mean= 55.6) and 46.67% of our participants could be identi ed as having alexithymia (TAS-20 ≥ 61) according to the TAS-20. Thus, it is not the case that the lack of an effect of alexithymia is due to abnormally low TAS-20 scores in our sample.
Though we do not refute the idea that di culties with emotion recognition from static images of faces may be better explained by the presence of alexithymia rather than autism, we do not nd evidence to support the claim that this argument extends to dynamic stimuli. Our data raise the possibility that alexithymic individuals experience di culties that are speci c to static stimuli and may be able to rely on dynamic information to compensate for di culties with recognising emotions from static snapshots of faces. An alternative explanation, however, is that dynamic stimuli simply contain more information (i.e. spatial information about the con guration of facial features and kinematic information about changes over time). Consequently, it may be that tasks which use only static stimuli are more challenging and thus more sensitive to individual differences than those that employ dynamic stimuli. Nevertheless, accuracy scores on our task ranged between -6.05 and 9.92 (out of 10) and our task was able to index differences in anger recognition between autistic and control individuals. Thus, our task is clearly sensitive to individual differences in emotion recognition. Future studies, which speci cally aim to test whether emotion processing di culties in alexithymia are speci c to static stimuli should titrate the sensitivity of static and dynamic tasks and compare performance on both tasks.
Of particular note is our nding that differences between autistic and control individuals are restricted to the recognition of angry expressions. This nding is in line with previous research suggesting that angry expressions are better recognized by non-autistic compared to autistic individuals (Ashwin et al., 2006;Bal et al., 2010;Brewer et al., 2016;Leung, Pang, Brian, & Taylor, 2019;Song & Hakoda, 2018a) and is supported by meta-analytic evidence demonstrating greater differences between ASD and control groups in the recognition of angry compared to happy and sad expressions (Lozier, Vanmeter, & Marsh, 2014).
Importantly, however, some of these previous studies did not measure alexithymia (Ashwin et al., 2006;Bal et al., 2010;Leung, Pang, Brian, & Taylor, 2019;Song & Hakoda, 2018) and in those that did, alexithymic and ASD traits were highly confounded , making it impossible to determine whether differences in anger recognition were attributable to alexithymia or ASD. The present study resolves this ambiguity and suggests that di culties with recognising angry expressions at the 'normal' spatial and speed level are related to autism, not alexithymia.
An important observation is that in the current paradigm both groups performed equally well for slowed angry expressions, but whilst the controls bene tted from the K1 to K2 speed increase (i.e. 50% to 100% speed), the autistic participants only bene tted from the K2 to K3 speed increase (i.e. accuracy only increased when the stimulus was played at 150% of normal speed). These ndings raise the possibility that autistic individuals may have a higher 'kinematic threshold' for perceiving angry expressions (i.e. an angry expression has to be moving quite quickly before it actually appears angry or angrier to ASD participants). This idea builds upon the ndings of a previous study that used static photographic stimuli at varying expressive intensities (constructed by repeatedly morphing a full expression with a neutral expression to result in 9 intensity levels for each emotion) to estimate identi cation thresholds (the intensity at which an expression is identi ed correctly on two consecutive trials) for autistic and control participants. The authors found that autistic individuals had signi cantly higher identi cation thresholds than controls, meaning that a higher intensity was necessary before an expression appeared angry to ASD participants. Importantly, this study also found no signi cant group differences in identi cation thresholds for happiness or sadness (Song & Hakoda, 2018). These ndings suggest that autistic individuals have a different identi cation threshold for static angry expressions. For dynamic facial expressions, it may be that autistic and control individuals have a different 'kinematic identi cation threshold' such that the expression must move more quickly (than would be required for control individuals) before it is identi ed as angry. Further research is necessary to investigate whether the group difference in recognising angry expressions at the S2K2 level is underpinned by a difference in kinematic identi cation thresholds.
Another (non-mutually exclusive) explanation for why the autistic individuals may have particular di culty recognizing angry expressions relates to movement production. Previous studies have documented differences between autistic and control participants in the production of facial expressions of emotion Keating & Cook, 2020). In our study, we used PLF videos that were made by lming four neurotypical participants posing different emotional states. Given that autistic and neurotypical individuals may produce different facial expressions and that one's own movement patterns in uence the perception and interpretation of the movements of others (Cook, 2016;Eddy & Cook, 2018;Edey, Yon, Cook, Dumontheil, & Press, 2017;Happé, Cook, & Bird, 2017) our autistic participants might have struggled to read emotion in our PLF videos because the expressions were dissimilar to expressions that they would have adopted themselves. To date studies that have documented differences between autistic and control participants in the production of facial expressions of emotion have used neurotypical observer ratings as a measure of the quality of facial expression (i.e. from the perspective of a neurotypical rater autistic individuals produce expressions which appear "atypical"). Consequently, research has not yet identi ed what speci cally is different about autistic and non-autistic facial expressions. Importantly, differences might be found in the nal arrangement of facial features (i.e. spatial differences) or the speed/acceleration/jerk with which individuals reach these expressions (i.e. kinematic differences). Further research is necessary to a) characterize the expressive differences between autistic and neurotypical individuals, b) ascertain whether there are greater expressive differences between the groups for angry than happy and sad expressions and, c) con rm whether such differences in movement pro le contribute to emotion recognition di culties.

Limitations
In the present study, we aimed to produce statistically rigorous and replicable results. The standard alpha level (p < .05) has recently been called into question for its utility and appropriateness in psychological research (Amrhein & Greenland, 2018;Benjamin et al., 2018;Halsey, Curran-Everett, Vowler, & Drummond, 2015;Lakens et al., 2018). Hence, we are reassured to see that our main ndings remain signi cant, after Bonferroni-correction and, when we set a more conservative alpha threshold of 0.025. Importantly, substantial effect sizes and Bayes factors support our low p values, thus providing us with further con dence in our results. Therefore, we believe our ndings make sound contributions to the literatures regarding alexithymia, ASD and dynamic facial expression recognition, however, there are several limitations to consider.
One potential limitation is that due to COVID-19-related restrictions on face-to-face testing, only 22 of our ASD group completed ADOS-2 assessments. As a result, we have limited information about whether the remaining 9 participants would surpass the threshold for an autism or autism spectrum diagnosis on the ADOS-2. In addition, of the 22 participants that did complete the observational assessment, just 15 met criteria for a diagnosis. Hence, it is possible that our ASD group display less frequent or lower intensity autistic behaviours than would typically be seen in an ASD population. In spite of this we identi ed a signi cant group difference. Note that this limitation may have resulted in false negatives or an underestimation of the true effect size. However, it is highly unlikely that it could have resulted in false positives or in ated effects sizes.
Another potential limitation of this study is that we used the self-report TAS-20 to measure alexithymia.
Whilst 89% of studies comparing the emotional self-awareness of autistic and non-autistic participants use self-report measures (and 62% use the TAS-20; Huggins, Donnan, Cameron, & Williams, 2020), some authors (e.g. Leising, Grande, & Faber, 2009;Marchesi, Ossola, Tonna, & De Pan lis, 2014) have questioned their utility as "people with alexithymia, by de nition, should not be able to report their psychological state" (Marchesi et al., 2014). However, endeavours to develop objective measures of alexithymia are in their infancy and early attempts are yet to be replicated (e.g. Gaigg, Cornell, & Bird, 2018) and thus self-report measures are necessary. Whilst the TAS-20 has long been the gold-standard tool for assessing alexithymia, there are some concerns that it might actually be a measure of psychopathology symptoms or current levels of psychological distress (see Badura, 2003;Helmes, McNeill, Holden, & Jackson, 2008;Leising et al., 2009;Marchesi et al., 2014;Preece et al., 2020;Rief, Heuser, & Fichter, 1996). Further studies may try to replicate our results using alternative measures of alexithymia such as the Perth Alexithymia Questionnaire (Preece, Becerra, Robinson, Dandy, & Allan, 2018) or Bermond Vorst Alexithymia Questionnaire (BVAQ; Vorst & Bermond, 2001), which have been argued to index an alexithymia construct that is distinct from individuals' current level of psychological distress (Preece et al., 2020). However, since our aim was to investigate whether the alexithymia hypothesis applies, not only to emotion recognition from static face stimuli, but also to recognition from dynamic stimuli, it was crucial for the design of the current study that we employ the same measure of alexithymia (i.e. the TAS-20) as has previously been employed in the emotion recognition literature Milosavljevic et al., 2016;Oakley et al., 2016;Ola & Gullon-Scott, 2020).

Conclusions
The current study tested whether alexithymia can account for recognition di culties with respect to both spatial and kinematic aspects of facial emotional expression recognition in ASD. In con ict with our hypotheses, emotion recognition accuracy did not covary with alexithymia scores. Rather, we observed that autistic, relative to control, adults showed reduced recognition of angry expressions at the normal (100%) spatial and speed level. Interestingly whilst for controls, recognition accuracy improved across all levels of the kinematic manipulation for angry videos, the autistic participants only bene tted from the 100% to 150% speed increase. Our results fail to provide evidence to support the idea that 'the alexithymia hypothesis' extends to emotion recognition from dynamic face stimuli. Instead our results draw attention to anger speci c differences in emotion recognition between autistic and non-autistic individuals. Future research should aim to elucidate why autistic individuals exhibit differences that are speci c to angry expressions.

Declarations
Ethics approval and consent to participate The study has received approval from the STEM Ethics Committee at the University of Birmingham (ERN_16-0281AP9A) and from the Birmingham Participatory Autism Research Team (B-PART) Consultancy group. All participants gave consent to participate.

Consent for publication
Not applicable Availability of data and materials The dataset generated and analyzed during the current study are available in the Open Science Framework (OSF) repository, https://osf.io/6jw3f/wiki/home/

Competing interests
The authors declare that they have no competing interests Example of one trial in the PLF main task. The xation cross display is presented for 500ms at the start of each trial. The average length of a stimulus video was 2 seconds. Rating scales remained on screen until participants had rated the stimulus and pressed the space bar Example of one trial in the PLF main task. The xation cross display is presented for 500ms at the start of each trial. The average length of a stimulus video was 2 seconds. Rating scales remained on screen until participants had rated the stimulus and pressed the space bar Figure 1 Example of one trial in the PLF main task. The xation cross display is presented for 500ms at the start of each trial. The average length of a stimulus video was 2 seconds. Rating scales remained on screen until participants had rated the stimulus and pressed the space bar