Extreme Demand Avoidance in Children with Autism Spectrum Disorder: Refinement of a Caregiver-Report Measure

Extreme/“pathological” demand avoidance (PDA) describes a presentation found in some children on the autism spectrum, characterized by obsessive resistance to everyday demands and requests. Demands often trigger avoidance behavior (e.g., distraction, excuses, withdrawal into role play). Pressure to comply can lead to escalation in emotional reactivity and behavior that challenges. Previously, the Extreme Demand Avoidance Questionnaire (EDA-Q) was developed to quantify resemblance to clinical accounts of PDA from caregiver reports. The aim of this study was to refine the EDA-Q using principal components analysis (PCA) and item response theory (IRT) analysis on parent/caregiver-report data from 334 children with ASD aged 5–17 years. PCA and IRT analyses identified eight items that are discriminating indices of EDA traits, and behave similarly with respect to quantifying EDA irrespective of child age, gender, reported academic level, or reported independence in daily living activities. The “EDA-8” showed good internal consistency (Cronbach’s alpha = .90) and convergent and divergent validity with other measures (some of which were only available for a subsample of 233 respondents). EDA-8 scores were not related to parental reports of ASD severity. Inspection of the test information function suggests that the EDA-8 may be a useful tool to identify children on the autism spectrum who show an extreme response to demands, as a starting point for more in-depth assessment.

"Pathological demand avoidance" (PDA) was coined in the 1980s to describe a profile seen in some children on the autism spectrum/with autistic features characterized by obsessive resistance to everyday requests, plus strategic or "socially manipulative" behavior to avoid (Newson, 1983;Newson et al., 2003). Routine requests triggered attempts to distract, elaborate excuses, negotiation, or withdrawal, which could escalate into threats, aggression, destructive behavior, or self-harm if pursued (Eaton & Weaver, 2020;Newson et al., 2003;O'Nions et al., 2018a, b;Stuart et al., 2019). Newson et al. (2003) argued that these behaviors did not reflect willful defiance, and suggested that the extreme response to demands was best construed as a panic attack. Newson et al. (2003) described demand avoidant children as apparently insensitive to social hierarchy or age-appropriate behavior: they would often transgress social norms (e.g., behaving in ways that peers would view as embarrassing or bizarre). They were reportedly comfortable in role play and pretend, taking on others' roles as a "convenient way of being," such as assuming the role of a teacher and giving instructions to peers (Newson et al., 2003). Although for some this could reflect fluidity with regard to an intrinsic sense of social identity, these behaviors may also reflect camouflage/ masking, described by some autistic people as a means to avoid unwanted social attention (e.g., Livingston et al., 2019).
Children described as having PDA showed extreme lability of mood, including sudden changes from loving to aggressive behavior, impulsivity, obsessions, passivity during infancy, and neurological "soft signs" such as motor clumsiness (Newson et al., 2003). They were as often girls as boys (Newson et al., 2003). Recent work has suggested other cooccurring features, including attempts to control situations and others' activities using coercive strategies (e.g., threats), elaborate excuses, sabotaging, and extreme aggression (Eaton & Weaver, 2020;O'Nions et al., 2018a, b). These behaviors are reportedly resistant to traditional reward and consequencebased strategies (Eaton & Weaver, 2020). Newson et al. (2003) reported the findings of a discriminant functions analysis for a sample recruited between 1975 and 2000. This analysis identified fewer "typical" autism features (e.g., difficulties with eye contact, lack of symbolic play, stereotypical motor mannerisms, etc.) in those with PDA compared to those with more typical autism/Asperger presentations. Strategies effective for children with "typical" autism, such as routine and repetition, were reportedly unhelpful for the demand avoidant group, who resisted the imposition of adult control. Instead, the demand avoidant group were said to benefit from strategies that were not rule based, such as using novelty to distract from perceived demands (Newson et al., 2003). Newson et al.'s (2003) findings may partly reflect a "collider bias" (Cole et al., 2010;O'Nions & Eaton, 2020): an unintuitive bias whereby the relationship between two factors is distorted when both factors independently increase the chances of being included in a research study or clinical cohort (they "collide"). This bias may have occurred in Newson et al.'s (2003) sample. Given the limited awareness of autism at the time and low estimates of autism prevalence (e.g., Wing & Potter, 2002), severe difficulties were likely needed to warrant a referral and assessment. Either severe autism features (i.e., severe social and communicative impairments, echolalia, stereotypies, etc.) or significant challenges with behavior (including avoidance of routine demands/PDA features) would be necessary and sufficient for referral. Assuming that typical autism and PDA features are not part of a single dimension of autism severity, in Newson et al.'s (2003) sample, those with the most severe PDA profiles would have comparatively fewer typical autism features, and those with the most severe typical autism profiles would have comparatively fewer PDA features.
Across the broader autism spectrum as we now know it, PDA characteristics may not be negatively related to more typical autism features. Indeed, analysis of clinical data from 2006 to 2010 suggested that those with PDA features shared similar qualitative impairments in social interaction, social imagination and pretend play, and rigid and repetitive behaviors and activities compared to those without (O'Nions et al., 2016).
The last ten years have seen a rapid increase in interest in PDA in the UK, largely driven by advocacy work by parentled organizations and those with lived experience. A major impetus is that children with a PDA profile often experience severe challenges at home and school, struggling even in specialist settings (Christie et al., 2012;Gore Langton & Frederickson, 2016;O'Nions et al., 2018a, b;Ozsivadjian, 2020;PDA Society, 2019). A survey of nearly 1500 respondents conducted by the UK PDA Society revealed that, for many parents, adopting "PDA strategies," including indirect and non-confrontational approaches to making demands, had been helpful. Survey respondents reported that 70% of 969 young people were unable to tolerate their school environment or were home educated, highlighting substantial unmet need (PDA Society, 2019).
Despite enthusiasm from parents and those with lived experience, the concept of PDA has sparked disagreement and debate (Green et al., 2018;Woods, 2020). It has been argued that PDA should be viewed as a collection of symptoms rather than a syndrome (Green et al., 2018). However, there is emerging consensus that some children with ASD do present with a behavioral profile resembling PDA, evidenced by work from several independent groups (Eaton & Banting, 2013;Eaton & Weaver, 2020;Gillberg et al., 2015;Green et al., 2018;O'Nions et al., 2018a, b;Stuart et al., 2019), and international scholars who report that some children with ASD find routine demands aversive and may react to pressure to comply with avoidance and behavior that challenges (e.g., Agazzi et al., 2013;Lucyshyn et al., 2004Lucyshyn et al., , 2007. The difficulties experienced by young people and their families provide a clear imperative for further investigation of extreme demand avoidance (EDA) in children with ASD. Previously, the "Extreme Demand Avoidance Questionnaire" (EDA-Q) was developed to quantify traits described in accounts of PDA based on informant reports (O'Nions, Christie, et al., 2014a). Items drew on descriptive accounts of PDA (Newson et al., 2003), unpublished materials authored by Newson, and relevant items from the Diagnostic Interview for Social and Communication Disorders (DISCO) (Leekam et al., 2002). Items were reviewed by clinical experts. The pool of EDA-Q items was then refined by dropping items that failed to differentiate "PDA" and "non-PDA" groups, based on parental reports of their child's behaviors and whether they had been clinically identified or were suspected of having PDA (O'Nions, Christie, et al., 2014a).
The 26 items in the final version of the EDA-Q included questions focusing on avoidance of demands and social manipulation for the purposes of avoidance/controlling interactions (items 1, 2, 7, 11, 16, 21); insensitivity to hierarchy/ praise/reputation with peers (items 5, 9, 12, 14, 20, 25); emotional lability in response to demands or perceived pressure (items 4, 13, 15, 22); need for control (items 3, 23); lack of responsibility/blaming (items 17, 18); mimicry and role play (items 6, 8, 10, 24); distractedness (item 19); and passivity (item 26). A single total score was generated for the scale, and cut-offs to identify those at risk of being clinically identified as having PDA were determined. However, without agreed-upon clinical criteria for PDA, it was not possible to objectively assess the validity of the measure or the cut-offs.
Principal components analysis (PCA) of the scale, which was not restricted to those reported to have a diagnosis of ASD, suggested that all but three items loaded onto the first component at > .|40| (O'Nions, Christie, et al., 2014a). However, given that the EDA-Q was designed as a checklist to quantify resemblance to Newson et al.'s (2003) description of PDA, the measure was not refined based on component loadings. Therefore, the EDA-Q may contain items that add little to, or possibly detract from, the quantification of one or more underlying dimensions. Items may also behave differently, relative to the scale as a whole, in males vs. females, younger vs. older children, those with higher vs. lower ability, or higher vs. lower independence in daily living activities. Refinement of the scale and analysis of item functioning in a sample of children with ASD is needed to improve the scale's reliability and assess the extent to which it can measure differing severity levels of EDA traits (or sub-dimensions) in children with ASD.
Non-compliance and emotional reactivity have become a focus for ASD research internationally. Questionnaires have been developed that are designed for ASD populations such as the Emotion Dysregulation Inventory (EDI) (Mazefsky et al., 2018a, b;Mazefsky et al., 2020), which measures emotional reactivity and dysphoria; and the Home Situations Questionnaire -Pervasive developmental disorders version (HSQ) (Chowdhury et al., 2016), which measures resistance to daily demands. These scales afford opportunities to examine convergent validity with the EDA-Q. Other scales, such as the Strengths and Difficulties Questionnaire (Goodman, 1997), capture traits which, although elevated in PDA (O'Nions, Viding, et al., 2014b), are less conceptually central, and are therefore expected to diverge from EDA scores. Given that children with a PDA profile score lower on certain ASD characteristics compared to those with more typical autism/ Asperger presentations, we would also expect EDA scores to diverge from measures of ASD severity.
The aims of the present study were therefore as follows: (1) to conduct psychometric analysis to refine the EDA-Q using parent/caregiver data for children reported to have ASD, (2) to explore convergent and divergent validity of the refined EDA measure compared to other scales, and (3) to explore whether the EDA measure shows a similar pattern of links to background factors and other dimensions of child behavior compared to those found for the Emotion Dysregulation Inventory (EDI) and the Home Situations Questionnaire (HSQ).

Participants
The present study analyzed data from parents/caregivers of children from two samples. Sample 1 was drawn from the previous EDA-Q study (O'Nions, Christie, et al., 2014a), which was approved by the King's College London Psychiatry, Nursing and Midwifery ethical review board. Sample 2 was drawn from a longitudinal study of parenting and child behavior approved by the KU Leuven Societal and Public Ethics Committee. Informed consent was obtained from all individual participants included in each of the studies. Respondents completed questionnaires in English.
Sample 1 were drawn from a volunteer sample of 326 parents/caregivers of children aged 5-17 years from several sources including schools, conferences for parents and professionals on the topic of PDA, and web-based forums/mailing lists, including those with a focus on PDA (see O'Nions, Christie, et al., 2014a for a complete description). One hundred and thirty-nine children in the sample were reported by parents/caregivers to have ASD. Although respondents were not asked their country of residence, given that the recruitment sources were UK networks, we anticipate that the vast majority of respondents were UK based.
Sample 2 participants were drawn from a volunteer sample of 393 parents of children aged 6-16 years, recruited via links posted on social networks by the research team and, at the request of the research team, by other organizations. Groups included, but were not limited to, those with a particular focus on PDA, since the aim was to recruit parents of children spanning a range of profiles. Parents who had expressed an interest in participating in research through direct contact with the researchers were also invited to participate. Participants were encouraged to share information with other parents in their networks to facilitate further recruitment. Two hundred and forty-eight respondents who provided data on the EDA-Q reported that their child had ASD, and scored them ≥ 12 (as per Mazefsky et al., 2018a, b) on the Social Communication Questionnaire (SCQ; Rutter et al., 2003). No ASD trait measure was available for Sample 1; thus, this extra criterion could not be applied. Almost all of Sample 2 participants who met eligibility criteria were resident in the UK or Ireland (230, 99%). Table 1 describes how Sample 1 and Sample 2 were combined.

Procedure
Data were gathered through self-administered questionnaires collected on paper (Sample 1) and electronically (Samples 1 and 2), completed by the child's parent/caregiver. Parents/ caregivers provided information about their child (e.g., their age, gender, diagnoses), plus information about their own highest educational qualification. They were asked to report on background characteristics including diagnoses their child had received and diagnoses that they suspected might apply to their child. For Sample 2, respondents also provided estimates of their child's academic ability relative to mainstream peers, their child's level of independence in daily living activities, their own age, socio-economic status, and the number of children in the family. Because recruitment was from community settings, no clinical data were available for either sample.

Measures
Measures Available for Samples 1 and 2 The 26-item Extreme Demand Avoidance Questionnaire (EDA-Q; O'Nions, Christie, et al., 2014a) was used to measure EDA traits. Items (described in Table 2) are rated on a four-point scale (0 = not true; 1 = somewhat true, 2 = mostly true, 3 = very true). Two items (14 and 20) are reverse scored. For Sample 2, parents reported on the severity of their child's difficulties within the past six months, while for Sample 1, they were not given a specific reference time frame. Cronbach's alpha for the 26-item EDA-Q was .92.
Data on the parent-report Strengths and Difficulties Questionnaire (SDQ) were also obtained (Goodman, 1997). The 25-item SDQ includes five subscales, each consisting of 5 items: Peer Problems, Conduct Problems, Hyperactivity, Emotional Problems, and Pro-social Behavior. Possible responses are "not true," "somewhat true," and "very true." Cut-off scores that estimate severity relative to the general population are available. Here, Cronbach's alpha for the subscales were .53 for Peer Problems, .70 for Conduct Problems, .72 for Hyperactivity, .73 for Emotional Problems, and .75 for Pro-social Behavior.
Measures Available for Sample 2 Only Child ASD severity was measured using the 40-item Social Communication Questionnaire (SCQ) -Lifetime Version (Rutter et al., 2003). Respondents are asked to respond "yes" or "no" to each of the items. Nineteen items focus on the entire developmental history, and 21 on the child's behavior when he/she was aged 4-5 years old. Thirty-nine of the 40 items contribute to the total score, indexing the child's ASD severity (Rutter et al., 2003). The measure also contains three subscales: Social Interaction (15 items), Social Communication (13 items), and Rigid and Repetitive Behaviors and Interests (RRBIs; 8 items). Here, Cronbach's alpha was .81 for the total score, .74 for Social Interaction, .62 for Social Communication, and .63 for Rigid and Repetitive Behaviors and Interests.
The 30-item Emotion Dysregulation Inventory (EDI) was used to quantify observable signs of emotional dysregulation in children with ASD (Mazefsky et al., 2018a, b). The 24-item Reactivity subscale captures high arousal, aggression, emotional outbursts, rapid escalation in intensity, and extreme emotional responses. The 6-item Dysphoria subscale captures lower arousal, unease, anxiety, and low mood. Items are rated on a 5-point thermometer scale (0 = not at all, 1 = mild, 2 = moderate, 3 = severe, 4 = very severe), with severity capturing both frequency and intensity in the past week. The measure has excellent reliability and validity (Mazefsky et al., 2018a, b). Here, Cronbach's alpha was .97 for Reactivity and .88 for Dysphoria.
Non-compliance was measured using the Home Situations Questionnaire -PDD (HSQ; Chowdhury et al., 2010Chowdhury et al., , 2016, quantifying the intensity of reactivity and problem behavior when faced with instructions, commands, or rules in the past month. Demand-Specific Non-compliance describes reactivity Combined sample 334 a For Sample 2 participants, an additional criterion was a score ≥ 12 on the SCQ b We identified participants where the same respondent may have been included in both Sample 1 and Sample 2 by checking whether there were children with an identical combination of birth year, birth month, and gender in both Samples 1 and 2. We retained the participants from the sample in which their combination of birth year, birth month, and gender was more numerous, i.e., if Sample 1 contained two individuals with the same combination, but Sample 2 had only one, then the Sample 2 participant was dropped. If there was an identical number of matched participants (e.g., two in Sample 1 and two in Ssample 2), we dropped the participants from Sample 1, because there were more measures available for each participant in Sample 2 in routine contexts, and Socially Inflexible Non-compliance describes reactivity in less routine, or more social situations, e.g., when visiting friends. For each of the 24 constituent items, a score of 0 designates no problems, 1-3 designates "mild," 4-6 designates "moderate," and 7-9 designates "severe" problems. Here, Cronbach's alpha was .91 for Demand Specific Non-compliance and .89 for Socially Inflexible Non-compliance.

Data Analyses
Psychometric Analyses The dimensionality of the EDA-Q items was first examined using principal components analysis (PCA) with a Varimax rotation. This allowed us to explore whether the EDA items reflect one or more underlying trait(s). Inspection of the scree plot was used to determine the number of components to inform extraction of the optimal number of subscales (Cattell,  For IRT analysis, a two-parameter graded response model was used (GRM model;Samejima, 1968), which can be estimated for questions with > 2 ordinal response categories. The GRM model estimates a slope or "discrimination" parameter, reflecting how well the item differentiates severity levels of the trait. Score on the trait is modeled as a latent variable derived from the item set entered into the model. The severity of the underlying trait estimated for each respondent is termed Θ (theta) and is estimated to have a mean of 0.
The GRM model also estimates threshold parameters for each item. Since EDA-Q items have four response options, a total of three threshold parameters are estimated. Threshold parameters give the theta score at which there is a 50% or greater probability that a particular item response will be made. For each item, the first threshold parameter tells us the level of Θ needed for a likelihood ≥ 50% of endorsing "somewhat true" rather than "not true." The second tells us the level of Θ needed for a ≥ 50% likelihood of endorsing "mostly true" rather than "somewhat true," and the third for a ≥ 50% likelihood of endorsing "very true" than "mostly true." Threshold parameters determine the item information function, which describes how much variability is captured within scores for a particular item (Embretson & Reise, 2000;StataCorp, 2019).
Following Mazefsky et al. (2018a, b), two criteria were applied to retain items based on GRM parameters. First, items needed to have slope (discrimination) parameters > 1. Second, items needed to have an information function with a peak > 1. Items were dropped if these criteria were not met, and IRT analysis repeated with the reduced item bank until all items included met threshold levels. Item discrimination parameters were checked to examine local dependence (i.e., where one or more items are related to each other due to their proximity in the scale), which can be reflected in unusually high discrimination parameters (> 4) relative to other items on the scale (Nguyen et al., 2014).
As per Mazefsky et al. (2018a, b), differential item functioning (DIF) analysis was then conducted to explore whether items behaved similarly with respect to their quantification of the trait in (a) older vs. younger children; (b) males vs. females; (c) those with higher or lower parent-reported intellectual ability levels; and (d) those with higher vs. lower independence in daily living activities. An item is designated as showing DIF if it is more or less difficult to endorse, or more or less discriminating in one or other group (e.g., males or females). DIF analysis was conducted by fitting GRM models in which item parameters for one item at a time were allowed to vary by group, with all other items fitted with parameters constrained to be the same across both groups. The likelihood ratio test was used to explore the presence of DIF.
Items showing DIF were dropped from the refined version.
Phenotypic Analyses Correlation analysis was used to examine convergent validity between the reduced EDA item set and other validated measures with strong conceptual relevance: EDI Reactivity, HSQ Demand Specific Non-compliance, and HSQ Socially Inflexible Non-compliance. Links were then explored between EDA and background/demographic factors, plus other dimensions of child behavior: SDQ Conduct Problems, Emotional Problems, Peer Problems, Hyperactivity, and Pro-social Behavior, and EDI Dysphoria to examine divergent validity. Divergent validity was also investigated by measuring relations between EDA scores and ASD severity using the SCQ total score, plus scores for Social Interaction, Social Communication, and RRBI subscales. Fisher's r to z transformation was used to examine differences in correlation magnitude for relations between EDA and conceptually similar subscales expected to converge with EDA scores (EDI Reactivity, HSQ Demand Specific Non-compliance, and HSQ Socially Inflexible Non-compliance) vs. conceptually distinct subscales expected to diverge with EDA scores (SDQ subscales, EDI Dysphoria, and SCQ subscales). Finally, in exploratory analyses, correlation coefficients were compared between EDA and measures expected to show divergence vs. relations between EDI Reactivity, HSQ Demand Specific Noncompliance, and HSQ Socially Inflexible Non-compliance and measures expected to show divergence. Fisher's r to z transformation was used to test for differences in correlation coefficients.
Bonferroni correction for multiple comparisons was applied when interpreting results. Pearson correlation coefficients were calculated for continuous variables unless histograms revealed skew, in which case Spearman's rank coefficients were calculated. For Fisher's r to z transformations, we compared bivariate Spearman's rho coefficients and analyzed only data from Sample 2 to ensure that differences in sample size for the different subscales did not impact the results. Analyses were run using Stata 16.

Descriptive Data
Descriptive data are presented in Table 1 for samples 1 and 2 separately and for the pooled sample. Accepting a Bonferroniadjusted significance level of p < .002 (21 comparisons), Sample 1, on average, contained parents/caregivers reporting on older ASD children with fewer anxiety disorder diagnoses, more diagnoses of mild intellectual disability, and lower levels of hyperactivity and emotional symptoms. Fewer respondents in Sample 1 rated their child as scoring above the atypical threshold for emotional symptoms or total difficulties on the SDQ. Respondents in Sample 1 were less highly educated compared to those in Sample 2.

Psychometric Analyses
Following Hastings et al. (2005) and Benson (2010), a PCA with Varimax rotation was used to explore covariance among EDA-Q items. The scree plot showed an elbow at the second eigenvalue, so PCA was repeated with the number of extracted components constrained to one, which explained 38% of the variance. Items were deemed to meet criteria for retention in the scale if they loaded ≥ .|40| onto the extracted component (e.g., Benson, 2010). Twenty-three out of 26 items loaded onto the component at above-threshold levels (Table 2; items 14, 20, and 26 failed to load). The three items that did not meet this criterion were not taken forward to IRT analysis.
Analyses with Sample 1 and 2 separately identified the same three items that failed to load significantly in both samples. In Sample 2, two further items fell modestly below the cut-off (items 10 and 24). Because these loadings were close to the threshold (≥ .35), these items were taken forward to IRT analysis.
Item Response Theory (IRT) Analysis For IRT analysis, a twoparameter graded response model was estimated (GRM, Samejima, 1969). We ran the first round of IRT including the 23 items taken forward from PCA (Table S1a). Four items had discrimination parameters < 1, and a further 7 items had item information functions that peaked below 1: these items were dropped. A second round of GRM modeling was run with the 12 remaining items (Table S1b), for which item discrimination parameters were all > 1, and item information functions all peaked > 1. This 12-item set was taken forward to DIF analysis.
Differential Item Functioning Analysis Following Mazefsky et al. (2018a, b), we ran GRM models to examine the presence of DIF between (a) males vs. females, (b) younger (< 12 years) vs. older children, (c) children reported to have similar ability to mainstream peers vs. children behind mainstream peers, and (d) children with more vs. less independence in daily living activities. Accepting a significance level of p < .05 uncorrected, results of the analysis (Table S2) revealed one item that behaved differently in relation to its quantification of trait EDA (theta) in older vs. younger children (item 17), and three that behaved differently in those with higher vs. lower ability (items 11, 13, and 25). These items were dropped from the refined version of the scale.
A final round of GRM modeling showed that all eight items met discrimination and item information thresholds (Table S3). Item information functions for each item are presented in Fig. 1a. The final item set is provided in full in Table 4. Cronbach's alpha for the EDA-8 item set was .90. Mean score was 17.31 (range = 0-24, standard deviation = 6.41, median = 19). Endorsement rates in the present sample for each of the eight items are presented in Fig. 2.
Items were summed to generate a total score that we refer to henceforth as "EDA-8." Inspection of the test information function (Fig. 1b) revealed that the item set appears to provide maximum information for those scoring at approximately θ = − .67, which falls between the 24.25th and the 24.85th percentile in the present sample. This is equivalent to a total score between 12 and 13. Either side of this point, the standard error of the test information function gradually increases.
According to Embretson and Reise (2000), an information score of 10 and a standard error estimate of around .31 at a given level of theta is equivalent to a reliability coefficient of .90. Inspection of the test information function (Fig. 1b) suggests that the EDA-8 offers this level of reliability for theta scores ranging from − 1.96 to .23, equivalent to scores between the 3.29th and the 57.19th percentiles in the present sample. This is equivalent to a raw score ranging from 2 to 19. Outside of this range, although offering some psychometric information, the measure is less good at differentiating different levels of severity.

Phenotypic Analysis
Convergent Validity To explore convergent validity, we first calculated correlations with measures conceptually relevant to  Figure S1) revealed that EDA-8 scores were skewed, so Spearman correlations were calculated. Correlations with EDA-8 were r s = .66 for EDI Reactivity, r s = .49 for Demand-Specific Non-compliance, and r s = .57 for Socially Inflexible Non-compliance (all p values < .001).
Links Between EDA-8, Background/Demographic Factors, and Other Scales Correlations were calculated between EDA-8 scores and background/ demographic factors (Table 5). None survived correction for multiple comparisons, although the negative association between EDA-8 scores and age (r s = −.18) was close to the Bonferroni-adjusted significance threshold. Considering measures of child behavioral challenge, EDA-8 scores were significantly related to all SDQ subscales, with coefficients between − .20 and .24 for all except Conduct Problems, for which the coefficient was .65. EDA-8 scores were also robustly related to EDI Dysphoria (r s = .47). No links between EDA-8 and SCQ total or subscale scores (indexing ASD severity) survived correction for multiple comparisons (all r s < .|13|). We compared the correlation coefficients for measures hypothesized to show convergence or divergence with EDA using Fisher's r to z transformation (see Table S4). Conceptually relevant measures (EDI Reactivity, Demand Specific Non-compliance, and Socially Inflexible Noncompliance) showed significantly stronger relations with EDA-8 scores compared to subscales identified as conceptually distinct, with the exception of SDQ Conduct Problems, which showed a similar relationship with EDA to all three conceptually relevant measures, and EDI Dysphoria, which showed a similar relationship with EDA to Demand Specific Non-compliance and Socially Inflexible Non-compliance.
Links Between Other EDA-Relevant Measures, Background/ Demographic Factors, and Other Scales Considering EDI Reactivity, no significant links were found with demographic and background factors. For both Demand Specific Noncompliance and Socially Inflexible Non-compliance, the only significant association was with parental reports of the child's lack of independence in daily living activities, which was significantly related to greater non-compliance in response to instructions, commands, or rules for both subscales (r s = .45 and .35, respectively).
Similar to the EDA-8, Reactivity, Demand Specific Noncompliance, and Socially Inflexible Non-compliance were most strongly linked with SDQ Conduct Problems (r = .55, .41, and .42, respectively) and EDI Dysphoria subscales (r = .68, .46, and .46, respectively). Relations with other subscales were more modest and did not survive Bonferroni correction, with two exceptions: EDI Reactivity was significantly positively related to Emotional Problems (r = .28), and Socially Inflexible Non-compliance was significantly negatively related to Prosocial Behavior (r = − .29). In terms of links with ASD severity, only Demand Specific Non-compliance showed a positive link with RRBIs that survived correction for multiple comparisons (r s = .22). Relations between EDA-8 and SCQ total score are presented in Figure S2 (r s = .07, p > .1).
In exploratory analyses, we compared the magnitude of correlation coefficients for EDA-8 and each of the measures expected to show divergence, vs. coefficients for EDI Reactivity, Demand Specific Non-compliance, and Socially   (Table S5). We found that coefficients were similar, with the following exceptions: SDQ Conduct Problems was more strongly related to EDA-8 than it was to both Demand Specific Non-compliance and Socially Inflexible Non-compliance, and EDI Dysphoria was more strongly related to EDI Reactivity than it was to EDA-8.

Discussion
The aim of the present study was to conduct psychometric analysis to refine the EDA-Q using data from parents/ caregivers of children reported to have an ASD diagnosis.
The goal was to identify whether one or more dimensions best described EDA-Q items in an ASD sample, and to drop items that were not sufficiently discriminating, or which behaved differently with respect to quantifying EDA dependent on the child's age, gender, ability level, or independence in daily living activities.
In line with previous analyses, we found that 23 of the 26 EDA-Q items loaded significantly onto a single underlying component (O'Nions, Christie, et al., 2014a). IRT analysis revealed that, adopting cut-offs used by Mazefsky et al. (2018a, b) to develop the EDI, 12 out of the 23 items showed sufficient discrimination of different levels of EDA to justify their retention in the refined scale.
The next stage of the analysis investigated whether any items behaved differently with respect to quantifying EDA related to child age, gender, ability, or independence. One item was found to behave differently related to child age, and three items related to parent-reported child ability. After these items were dropped, eight items remained, which formed the refined version (the "EDA-8"). Inspection of item content suggests that retained items cover the features consistently described in accounts of PDA: obsessive avoidance of demands and requests, outrageous or shocking behavior to avoid, need for control, poor awareness of hierarchy, and lability of mood.
The EDA-8 showed good convergent and divergent validity: relations with relevant measures (i.e., Reactivity and Demand Specific/Socially Inflexible Non-compliance) were stronger in comparison to links with other dimensions hypothesized to show divergence (hyperactivity, emotional symptoms, peer problems, and lack of prosocial behavior). Furthermore, relations between the EDA-8 and measures hypothesized to show divergence were broadly similar to relations for each of these subscales with Reactivity and Demand Specific/Socially Inflexible Non-compliance. The magnitudes of relations between EDA-8 and both Reactivity and Conduct Problems (≥ .65) were striking, particularly considering that the only area of content overlap with the EDA-8 was lability of mood. One possible explanation is that very high levels of EDA characteristics such as demand avoidance and need for control can trigger escalation in reactivity and behavior that challenges (e.g., aggression) when things are not on the child's terms.
In the present ASD sample, links between EDA-8 scores and ASD severity were weak and non-significant ( Figure S2). The implication of these findings is that those with more severe ASD are not more likely to show PDA characteristics-instead, these appear to be similarly likely to occur across the range of ASD severity. This is in line with previous work showing that those with high levels of PDA features shared similar qualitative impairments in social interaction, social imagination and pretend play, and rigid and repetitive behaviors and activities compared to those who did not have high levels of PDA features (O'Nions et al., 2016). These results are consistent with the suggestion that Newson et al.'s (2003) findings may have been impacted by a collider bias (O'Nions & Eaton, 2020).
These observations call into question the assumption derived from Newson et al.'s (2003) sample that ASD strategies such as routine and repetition are unhelpful for those with PDA features. Instead, the impact of the collider bias might imply that they could be less helpful for some children with EDA. Similarly, while novelty and flexibility may benefit some with EDA (Newson et al., 2003), others may respond well to routine-based approaches with adaptations to reduce emotional reactivity and make routine tasks more enjoyable and rewarding (e.g., Lucyshyn et al., 2015).
Inspection of the test information function revealed that the level of EDA can be estimated with most precision at a theta score of − .67, equivalent to a total score between 12 and 13. Either side of this point, the precision with which EDA traits can be measured reduces. In particular, the EDA-8 appears to have difficulty capturing individual variability in severity for scores above 19. The EDA-8 could therefore be used to identify whether further investigation of EDA features, including the application of more detailed measures, is warranted. If the goal is to capture variability in individuals with high levels of EDA traits, it may be advisable to use additional, more comprehensive tools.
The EDI and the HSQ measures, which captured variability in the present sample ( Figure S1), may be well suited for more in-depth measurement of day-to-day challenges. The strong relationship between EDA and EDI dysphoria highlights this as an important area of difficulty in those with EDA traits, which has been overlooked in previous work. The EDI has good psychometric properties and has been normed (Mazefsky et al., 2018a, b). Cut-offs are available to identify clinically significant difficulties.
The HSQ may be a useful tool for exploring the types of activities that trigger avoidance or behavior that challenges. This measure was identified as having favorable measurement properties compared to other measures of behavior problems in young children with autism (McConachie et al., 2015). Items in the HSQ (described in-depth in Chowdhury et al., 2016, Table 2) could be helpful in identifying particular triggers of avoidance and understanding how it impacts daily life. However, some parents in the current sample anecdotally reported that the HSQ was difficult to complete because they do not usually use "instructions, commands, or rules" for fear of triggering behavior that challenges.
Clinical accounts of PDA highlight a range of concerns, only some of which are covered in the EDA-8 and other measures described here. Omissions include attempts to control others' activities, which in some children may include coercive behavior. Other reported challenges include blaming or targeting others, sabotage, and difficulty taking responsibility (Eaton & Weaver, 2020;Newson et al., 2003;O'Nions et al., 2018a, b). Parental accounts suggest that these behaviors can have a very significant impact, making them important targets for measurement and intervention.

Limitations and Future Research Directions
Limitations of the present study include the lack of clinical data (e.g., gold-standard diagnostic instruments), reliance on informant report of diagnoses, reliance on a single method of data collection (i.e., questionnaires), and a single informant (one parent/caregiver). Further multimethod investigation is needed in a sample who have received standardized clinical assessments. We note that a similar pattern of results was reported by Chowdhury et al. (2016) with respect to links between the HSQ and other measures in a clinic-based ASD sample. Although common rater-bias could have inflated the strength of detected relations, we were able to detect differential links across measures, suggesting that this did not compromise the findings.
A further consideration is that the sampling approach used in the present study and the fact that the research team were known for previous research on PDA is likely to have led to an overrepresentation of children with EDA features. In the present sample, avoidance of demands, need for control, lack of sensitivity to hierarchy, and lability of mood could therefore appear to be more strongly linked than they are in general in children with ASD or other neurodevelopmental profiles. This could be due to common underlying factors/developmental processes that influence their co-occurrence (see O'Nions & Eaton, 2020, for a discussion), and/or because parents of children showing these co-occurring features were more likely to self-select into the study. Work is needed to investigate these possibilities in population representative ASD cohorts.
In recent work, Eaton and colleagues outlined criteria used to identify individuals with ASD who show a PDA presentation (Eaton & Weaver, 2020;O'Nions & Eaton, 2020). It remains to be seen whether the EDA-8 can discriminate those who meet these criteria from those for whom demand avoidance may be more time-limited and context specific. More work is needed to investigate whether a cut-off can be identified for the EDA-8, or whether measurement using more comprehensive tools would be required.
Further research is needed to explore stability of EDA traits. Here, we found a modest negative association between age and EDA-8 scores, which approached significance. Work by Gillberg et al. (2015) suggests that the majority of individuals showing EDA features in childhood no longer do so in adulthood. However, Gillberg et al.'s (2015) population-based ASD cohort contained only nine individuals who showed indicators of PDA, of whom only two engaged in socially manipulative or shocking behavior to avoid demands. Therefore, more work is needed to examine trajectories of EDA in children with ASD who show high levels of these traits.
Acknowledgements E. O'Nions was supported by a postdoctoral fellowship from the M.M. Delacroix Support Fund. We are extremely grateful to the parents/caregivers who took part in this research for their time and efforts. We thank Jane O'Nions for her helpful comments on the manuscript.
Author's Contribution E.O. contributed to the design of the study, conducted data analyses, and contributed to the writing of the study and the preparation/editing of the manuscript. F.H., E.V., and I.N. contributed to the design of the study, interpretation of results, and editing of the manuscript.

Declarations
Ethics Statement This research was approved by the King's College London Psychiatry, Nursing, and Midwifery Ethics Committee and the KU Leuven Societal and Public Ethics Committee. All procedures performed in studies involving human participants were in accordance with the ethical standards of the King's College London Psychiatry, Nursing, and Midwifery Ethics Committee and the KU Leuven Societal and Public Ethics Committee, and with the 1964 Helsinki declaration and its later amendments.

Conflict of Interest
The authors declare no competing interests.
Informed Consent Statement All participants provided informed consent prior to their participation in the study.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.