Introduction

Autism spectrum disorder (ASD) can be thought of as a pattern of quantitative variation along several behavioral domains. In addition to the core ASD behavioral domains of social communication and restricted/repetitive behaviors and interests (RRB), as defined by the Diagnostic and Statistical Manual of Mental Disorders Fifth Edition (DSM-5) (American Psychiatric Association, 2013), autistic individuals often show various degrees of alterations in social cognition, social anxiety, and executive functioning (Johnston et al., 2019; Maddox & White, 2015; Morrison et al., 2019; Spain et al., 2018). A key question for the field is how to best measure these autism spectrum-related dimensional traits in adults. Valid quantitative measures of adult autism spectrum traits are crucial both for genetics studies using quantitative phenotypes, as well as for clinical trials in which investigators need to assess quantitative changes in behaviors following a treatment. One major unanswered question is whether informant report measures and self-report measures of adult autism spectrum traits provide comparable or different sets of information.

Differences between self-reporting and proxy/informant-reporting for adults has been well-studied in fields other than autism research (e.g. in dementia, terminal illness, etc.), (Roydhouse et al., 2021). Also, in an expansive set of studies, discrepancy between child/adolescent self-reports and informant-reports about family relationships, victimization experiences, and other clinically relevant domains have been shown to predict clinical and behavioral outcomes independent of the scores themselves (e.g. De Los Reyes et al., 2010, 2019; Goodman et al., 2010; Laird & De Los Reyes, 2013). This body of work shows the potential power of reporter discrepancy – beyond identifying a measurement error – as a useful clinical metric (de Los Reyes, 2011; De Los Reyes et al., 2013). However, studies of reporter differences in autism have been limited, especially in adults. There are many reasons why self-and informant-reporting may differ, including (but not limited to) some aspects of autism phenotype being internal states and not directly observable, observer bias (e.g. Mandell et al., 2007; Obeid et al., 2020), and possible intentional efforts to mask/camouflage. Most studies of reporter differences in autism have focused on the degree of agreement among multiple informants for autistic children and youth (e.g. Stratis & Lecavalier, 2015). Previous work comparing parent vs. child/adolescent reports of ASD traits and associated symptoms has focused primarily on males and has not examined the effect of sex on self/informant report agreement (Johnson et al., 2009; Kalvin et al., 2020; Kenworthy et al., 2021; Lerner et al., 2012; Schwartzman & Corbett, 2020).

Among the very limited number of published studies comparing self vs. informant reports in adults with diagnosed autism spectrum disorder, Sandercock and colleagues compared self-reports of autistic adults without intellectual disability (ID) vs. informant (caregiver)-report accounts of ASD traits, daily living skills, and quality of life. They found good agreement for ASD traits, yet discrepancy in the reports of daily living skills and quality of life (Sandercock et al. 2020). Additionally, a study in young adult autistic males without ID compared interview, self-report, and parent-report measures and found discrepancies in the following areas: “peer interaction problems”, “difficulties with social cues”, and “narrow interest” (Cederlund et al., 2010). However, there is a need for more studies of self-informant report discrepancies in core autism spectrum traits in adults, as well as studies that examine potential contributors to such discrepancies. In addition to the core autism spectrum traits, other traits associated with the autism spectrum may have a major impact on adult quality of life, including executive functioning difficulties (Bishop-Fitzpatrick et al., 2016; Wallace et al., 2016). A study focusing on reporter differences of executive functioning found poor agreement between self vs. parent assessments of executive functioning as measured by the BRIEF in autistic adolescents, with autistic adolescents self-reporting fewer executive functioning difficulties than their parents reported (Kenworthy et al., 2021). A difference between self-reported and parent-reported executive functioning was not observed in the neurotypical sample (Kenworthy et al., 2021). Additionally, a previous meta-analysis showed that self-report BRIEF scales performed better in terms of clinical ASD discrimination than informant-report BRIEF scales (Leung & Zakzanis, 2014). Given the importance of executive functioning to overall adult functioning and quality of life, self/informant report discrepancies in reporting on executive functioning should be examined among autistic adults to extend the work previously done in an adolescent sample (Kenworthy et al., 2021).

Given the growing literature on sex differences in autism, one factor that might potentially affect self vs. informant report differences in autistic adults is sex assigned at birth. Sex assigned at birth is based on biological and physiological factors in prenatal development and at birth, while gender is socially and personally constructed across postnatal development. This distinction is especially important to be clear about when discussing autism, given the high gender diversity within the autistic population (e.g. George & Stokes, 2018). The vast majority of work has focused on sex assigned at birth (male/female), rather than gender, as a variable of interest. Several reviews have found that autistic males express more RRB than autistic females (Lai & Szatmari, 2020; Mandy & Skuse, 2008; Rubenstein et al., 2015; Van Wijngaarden-Cremers et al., 2014; Werling & Geschwind, 2013). Additionally, Lai and Szatmari point out that autistic females show culturally-defined “female-gender-typical narrow interests”, higher attention to social cues and interest in friendships, and greater linguistic abilities than autistic males, which (among other factors) can lead to delays in recognition and ASD diagnosis in females (Lai & Szatmari, 2020). Also, there is quite a bit of evidence that autistic females engage in more camouflaging (behaviors that would tend to conceal the ASD diagnosis) than autistic males, both in childhood and adulthood (Dean et al., 2017; Jorgenson et al., 2020; Lai et al., 2017; Schuck et al., 2019; Wood-Downie et al., 2021). However, work has also shown that autistic males (based on sex assigned at birth) as well as autistic cis-gender men and nonbinary individuals (based on gender identity) also engage in camouflaging (Hull et al., 2020; Lai et al., 2017). Variation in reported sex differences may be partly attributable to variation among studies in the types of assessments used (i.e. different questionnaires, teacher-report, parent-report, or clinical interview) (Kaat et al., 2021; Mandy & Skuse, 2008; Ratto et al., 2018). Taken together, previous evidence suggests that there are sex and possible gender differences in ASD phenotypes, which are also affected by camouflaging and possibly by assessment methods.

Overall, previous work has been limited in its use of self-report measures, and in examining self-report/informant-report agreement or discrepancy among adults on the autism spectrum. Additionally, previous work has been limited in its exploration of how sex may affect this self-report/informant-report agreement or discrepancy, despite the accumulation of evidence that ASD traits may be expressed and/or viewed differentially on the basis of sex. Clarifying self-report/informant report discrepancies for overall autism spectrum traits as well as executive functioning, and possible effects of sex on these discrepancies, would have very important implications for clinical assessment, quantitative genetics studies, and measurement of treatment outcome.

We sought to test the hypothesis that there would be self/informant report discrepancies regarding core autism spectrum traits and executive functioning in adults who are high in autism spectrum traits, but not in their family members who we expected to be lower in autism spectrum traits. In other words, we hypothesized that self/informant report discrepancies would be more likely in a group that is higher in autism spectrum traits. Given the high levels of camouflaging reported in females on the autism spectrum and the identified differences in the ASD phenotype among females compared to males (Frazier et al., 2014; Lai et al., 2017), we sought to test the hypothesis that there would be greater self-report/informant-report discrepancies in measures of autism-related traits for females on the spectrum than for males. To assess self-report vs. informant-report discrepancies in core autism spectrum-related traits, we used the self- and informant-report versions of the Social Responsiveness Scale-2 Adult (SRS-2A) (Constantino et al., 2003). To assess self-report vs. informant-report discrepancies in executive function, which is often affected in parallel with core autism spectrum traits, we used the self- and informant-report versions of the Brief Rating Inventory for Executive Function (BRIEF-A) (Donders & Strong, 2016; Rabin et al., 2006; Wilson et al., 2011).

Method

Participants

We recruited 103 adults high in autism spectrum traits as probands and 96 of their family members as part of an autism genetics study. Recruitment was via several sources, including from study ads placed on social media and the radio, as well as from local mental health clinicians. Study procedures were reviewed and approved by the appropriate Institutional Review Board. The following inclusion criteria were set for probands: (1) clinical and developmental history that documented meeting ASD criteria as defined by DSM-5 (American Psychiatric Association, 2013) and (2) verbal IQ above 70, as estimated by the Shipley-2 (Western Psychological Services, 2009). Verbal IQ values for probands in the current sample ranged from 76 to 144, with a mean of 118.3 and standard deviation of 14.6. These scores, interpreted qualitatively, indicate that probands were largely in the average to well above average verbal IQ range. To increase the capacity of individuals to participate, clinical and developmental history was collected via an extended (typically 1–2 h) telephone semi-structured interview conducted by one of the members of the research team supervised by the principal investigator. The interview was based on the diagnosis/intake questions used by the principal investigator in their clinical work as a psychiatrist specialized in autism in adulthood. Moreover, detailed information was gathered on psychiatric history, social communication behavioral history (e.g. eye contact, understanding nonliteral language and nonverbal social cues), RRB history (e.g. strong interests, repetitive behaviors, routines), sensory behavioral history (e.g. sensory hypersensitivity, hyper- or hypo-sensitivity to pain), treatment history, medication history, and genetic testing. Additionally, questions on developmental history, including details on pregnancy and child behavior development (e.g. mimicry of behavior, eye contact, motor coordination, imaginative play) were asked of a parent, caregiver, or other informant who knew the proband well as a child, when possible (n = 86) and of the proband when no informant was available (n = 17). This information was integrated with any prior clinical reports that participants could provide, when available. Because information was collected remotely in many cases, and partially during the COVID-19 pandemic, the Autism Diagnostic Observation Schedule (ADOS) was not conducted.

Information from the phone screen, in combination with prior clinical records and the Social Communication Questionnaire (SCQ, see below), was reviewed in a case conference including the research team and the principal investigator, a psychiatrist specializing in adult ASD, to determine if the potential proband met DSM-5 criteria for ASD and therefore was eligible for enrollment. Because not all probands had a prior clinical diagnosis of ASD, and gold-standard, in-person diagnostic assessments could not be conducted, we refer to probands as “high in ASD traits” rather than definitively having ASD diagnoses. Exclusion criteria for participation in the study were: (1) history of intellectual disability (ID), 2) recent (last 4 weeks) severe mood or psychotic symptoms, (3) recent severe aggressive or self-injurious behaviors, and (4) history of major neurological disorder (e.g. dementia, severe head trauma, recent seizures). Family members were included on the basis of their relationship to the probands and included first-, second-, third-, and fourth-degree relatives. The exclusion criteria for family members were: (1) recent severe mood or psychotic symptoms and (2) recent severe aggressive or self-injurious behaviors. Additionally, only family members who did not report any psychiatric diagnoses, neurological diagnoses, or neurodevelopmental disorders in the medical history battery were included in analyses, n = 96. Demographic information (i.e. sex assigned at birth, age, race, and gender identity) was self-reported by each participant during a telephone interview and through an online questionnaire. Sample demographics and sample size are reported in Table 1.

Table 1 Demographics data for participants and their informants are reported for probands and family members separately

Measures

Measures included a screening questionnaire—Social Communication Questionnaire (SCQ)—as well as two additional questionnaires – Social Responsiveness Scale-2 Adult (SRS-2A) and the BRIEF-Adult (BRIEF-A). The SCQ was collected as an informant-report only for probands, if a parent was available to complete it. The SRS-2A and BRIEF-A were collected as both self-report (participant answering questions about themselves) and informant-report (another person answering questions about the participant) versions. For the informant-report versions of the SRS-2A and BRIEF-A, the informants varied in their relationship to the probands and family members and included parents, siblings, offspring, therapists, friends, children, and spouses. Relationship information for informants is included in Supplementary Materials. The informant for each participant was selected in collaboration with the participant, based on who knew the participant the best, was available, and was ≥ 18 years old. For probands, there were 106 unique informants (greater than the number of proband participants because sometimes the informant for the SRS-2A was different than the informant for the BRIEF-A). Note unique informants are defined by the identity of the informant, not the participant, so if a participant had different people complete their SRS and BRIEF, then they would count as two unique informants. In contrast, if an informant completed reports for multiple participants, they would still only count as one unique informant. Of these informants, 17 of them were enrolled as family member participants. For family members, there were 83 unique informants (fewer than the number of family member participants because some informants rated multiple people – an average of 1.18 other people). Of those informants, 46 were also enrolled as family members (so they contributed a self-report on their own behavior as well as an informant-report on another participant). Additionally, 8 of the informants on family members were enrolled as probands in the present study.

SCQ

The SCQ is an informant-report measure designed as a diagnostic tool for autism and pervasive developmental disorder (Berument et al., 1999). The SCQ was developed as a companion screening measure for the Autism Diagnostic Interview – Revised (ADI-R). The SCQ items were deliberately chosen to match the ADI-R items that were found to have discriminative diagnostic validity. A meta-analysis of the use of the SCQ as a screening tool found that it had acceptable accuracy for the identification of ASD (AUC = 0.827) (Chesnut et al., 2017). In this study, the SCQ was used as one of several sources of information to determine eligibility for individuals to participate as probands in the study. Probands had a mean SCQ score of 14.6 with a standard deviation of 7.6.

SRS-2A

The SRS-2A is a 65-question measure, available as both informant-report and self-report, composed of five subscales measuring social cognition, social communication, social motivation, RRB, and social awareness. The SRS has good agreement with the ADI-R across multiple symptom domains (r = 0.60–0.79) as well as good inter-rater reliability (r = 0.75–0.91) (Constantino et al., 2003). In addition to being used as a diagnostic tool, the SRS has been used to quantify autism-related behaviors in the general population (Constantino & Todd, 2003). In this study, it was collected as both a self-report and informant-report measure for both probands and family members. The raw total SRS-2A score was used in analyses.

BRIEF-A

The BRIEF-A is a 75-question measure of executive functioning. Executive functioning subdomains measured in the BRIEF-A include the following abilities: inhibit, shift, emotional control, initiate, working memory, plan/organize, organization of materials, and monitor (Roth et al., 2005). The BRIEF has good reliability with an internal consistency of 0.80–0.98 across multiple raters and with a test–retest reliability of 0.76–0.85 (Gioia et al., 2000). The BRIEF has been used in both autistic and non-autistic populations (e.g. Donders & Strong, 2016; Granader et al., 2014; Kenworthy et al., 2021; Rabin et al., 2006; Wilson et al., 2011). In this study, it was collected as both a self-report and informant-report measure for both probands and family members. The raw Global Executive Composite score was used for analysis.

Assessing Demographic Differences Between Probands and Family Members

Possible confounding demographic differences between proband and family members were examined using a two-sample t-test for comparing mean age. Additionally, Pearson’s chi-square test was used to evaluate differences in proportions of sex, education level, and informant sex, and Fisher’s exact test was used for race. The Fisher’s exact test was used for comparing the proportions of different racial identities as there were multiple small/zero values in categories that prevented precise estimation of chi-square statistic or p-value using the Pearson’s chi-square test.

Examining Agreement and Inter-Rater Reliability of Self- and Informant-Report SRS-2A and BRIEF-A Scores Using Correlation Analysis

Agreement was visualized using Bland–Altman plots and tested using (1) Spearman correlation between self- and informant-report versions of the same questionnaires measuring autism spectrum-related behaviors (SRS-2A) and executive functioning (BRIEF-A) and 2) intra-class correlation (ICC) analysis. Spearman correlation coefficients were used as the data had non-normal, varied distributions. ICC was used to quantify inter-rater reliability between self- and informant-report for the SRS-2A raw total score, as well as for the BRIEF-A raw Global Executive Composite score. A one-way random effects model with absolute agreement as the output was run first to assess the validity of a single score. Raw scores were used to test the relationship of the scores with sex and age without any possible obscuring via T-score transformation. Analyses were conducted for probands and family members separately. The Benjamini–Hochberg correction for multiple comparisons was used. Exploratory analyses comparing correlation strengths were conducted using Fisher r-to-z transformation.

Comparing Self- vs. Informant-Report Discrepancies Between Groups

For both the SRS-2A and the BRIEF-A, discrepancies between the self- and informant-reports were quantified as discrepancy scores, which were calculated by subtracting the self-report score from the informant-report score. Positive discrepancy scores indicate that the informant-report score was higher than the self-report score, while negative discrepancy scores indicate that the self-report score was higher than the informant-report score. Following tests for normality and for equal variance, analysis of covariance (ANCOVA) was used to compare the discrepancy scores for SRS-2A and BRIEF-A between groups, as defined immediately below, while accounting for the potential confounding variable(s). Analyses were conducted for probands and family members separately, investigating first the effect of the sex of the individual self-reporting and being reported on (referred to as participant sex). Exploratory analyses examined the effect of gender identity of the individual self-reporting and being reported on (referred to as participant gender) and effect of the sex of the informant (referred to as informant sex) on discrepancy. Gender identity was not reported for informants, so informant gender was not examined. The Benjamini–Hochberg correction for multiple comparisons was used.

Results

Examining Agreement and Inter-Rater Reliability of Self- and Informant-Report

For the SRS-2A in probands, the mean discrepancy score was −18.50 points (indicating higher levels of self-report SRS-2A scores relative to informant-report SRS-2A scores, on average) with a standard deviation of 42.28, upper 95% confidence interval value of 64.38, and lower 95% confidence interval value of −101.38 (see Fig. 1). For the BRIEF-A in probands, the mean discrepancy score was −11.24 points with a standard deviation of 35.65, upper 95% confidence interval value of 58.64, and lower 95% confidence interval value of −81.13. Range, mean, and standard deviation for raw scores, from which discrepancy scores were calculated from, are reported in Table 2. In Spearman correlation analysis among probands, there was no significant association between the self-report and informant-report total scores on the SRS-2A (r = 0.08, p > 0.05), nor was there a significant association between the self-report and informant-report scores on the BRIEF-A (r = 0.07, p > 0.05). This lack of significant correlation between self-report and informant-report versions of the same measures suggests that there is a strong impact of who is reporting for these domains, self vs. informant (see Fig. 1). Additionally, in intra-class correlation (ICC) analysis among probands, there was poor inter-rater reliability between self-report and informant-report for the SRS-2A (ICC = 0.01, 95% confidence interval (CI) [−0.19, 0.20] and for the BRIEF-A (ICC = 0.00, 95% CI [−0.19, 0.20]). Neither the ICC coefficient for SRS-2A nor the ICC coefficient for BRIEF-A was significantly different from zero (F(97,98) = 1.01, p > 0.05; F(97,98) = 1.01, p > 0.05).

Fig. 1
figure 1

Lack of agreement between self-report and informant-report scores for the same measures in adults high in autism spectrum traits. A, B Bland–Altman plots. Difference between measurements is calculated by subtracting the self-report score from the informant-report score. Average measurement is calculated by taking the average of the self-report and the informant-report score. Mean difference between measurements (aka discrepancy) is shown with a solid black line. The dashed red lines represent 95% upper and lower limits of agreement for the measures. A Bland–Altman plot for self-report and informant-report total raw scores for the SRS-2A in probands. Mean discrepancy (shown by the solid black line) below zero for the SRS-2A for probands. B Bland–Altman plot for self-report and informant-report score for the BRIEF-A raw Global Executive Composite (GEC) Score. Mean discrepancy (shown by the solid black line) below zero for the BRIEF-A. C Correlation between self- and informant-report scores on the SRS-2A in probands. Spearman's rho and the associated p-value are reported. No significant correlation between self-and informant-report score for SRS-2A total for probands. D Correlation between self- and informant-report scores on the BRIEF-A GEC in probands. Spearman's rho and the associated p-value are reported. No significant correlation between self-and informant-report score for BRIEF-A GEC score for probands. SRS-2A  Social Responsiveness Scale-2 Adult, BRIEF-A  Behavioral Rating Inventory for Executive Functioning-Adult

Table 2 Range, mean, and standard deviation of the scores for the measures collected as self-report and informant-report (SRS-2A and BRIEF-A) and later used in discrepancy analyses

For the SRS-2A in family members, the mean discrepancy score was 0.02, with a standard deviation of 26.76, upper 95% confidence interval of 52.47, and lower 95% confidence interval of −52.42 (See Supplement Fig. 1). For the BRIEF-A in family members, the mean discrepancy score was -5.64, with a standard deviation of 27.47, upper 95% confidence interval of 48.20, and lower 95% confidence interval of −59.48. In contrast to probands, for Spearman correlation analysis with family members, there was a moderate association between the self-report and informant-report total scores for the SRS-2A (r = 0.38, p < 0.05) and for the BRIEF-A (r = 0.34, p < 0.05) (See Supplement Fig. 1). Among family members in intra-class correlation (ICC) analysis, there was poor inter-rater reliability between self-report and informant-report for the SRS-2A (ICC = 0.34, 95% confidence interval (CI) [0.15, 0.51] and for the BRIEF-A (ICC = 0.26, 95% CI [0.06, 0.44]). In contrast to the probands, the ICC coefficient for the SRS-2A and the ICC coefficient for BRIEF-A were significantly different from zero (F(91,92) = 2.03, p < 0.001; F(91,92) = 1.7, p < 0.01).

Comparing Discrepancy Between Groups

Results for comparing potentially confounding demographic variables between groups are in Table 1. After identifying age as a potentially confounding variable, we investigated the impact of participant sex, participant gender, and informant sex on discrepancy scores for both the SRS-2A and BRIEF-A for probands and family members separately. Among probands, females (M = −30.1, SD = 36.4) had significantly greater magnitude (directionally more negative) SRS-2A discrepancy scores than males (M = −10.8, SD = 43.9) while accounting for participant age (F(1,100) = 6.66, p < 0.05) with a medium effect size of participant sex (η2 = 0.06; see Fig. 2A). Recall that the negative discrepancy scores indicate higher levels of self-report SRS-2A scores relative to informant-report SRS-2A scores. There was not a statistically significant sex effect for BRIEF-A discrepancy scores in probands (F(1,99) = 4.00, p = 0.05; η2 = 0.04). For family members, there were no differences based on sex on the SRS-2A (F(1,95) = 0.17, p > 0.05; η2 = 0.00) or on the BRIEF-A (F(1,91) = 0.02, p > 0.05; η2 = 0.00) (see Supplementary Fig. 2A & C). For descriptive and exploratory purposes, we examined the effect of participant gender on discrepancy within our sample. Among probands, there was an effect of gender identity on SRS-2A discrepancy (F(4,97) = 2.75, p < 0.05) with a medium effect size of η2 = 0.10 and a non-statistically significant effect on BRIEF-A discrepancy (F(4, 96) = 1.69, p > 0.05) with a medium effect size of η2 = 0.06 (see Fig. 2B and D). For family members, gender identity was not systematically collected, so no additional analyses were run.

Fig. 2
figure 2

Effects of participant sex, participant gender, and informant sex on discrepancy between self- and informant-report scores for probands. There was a significant effect of participant sex and of informant sex on discrepancy scores in the SRS-2A (calculated from total raw scores) in probands, with a marginal effect of participant gender. However, there were no significant effects of participant sex, participant gender, or informant sex differences on discrepancy scores for the BRIEF-A raw Global Executive Composite (GEC) score. Discrepancy scores were calculated by subtracting the self-report scores from informant-report scores. A discrepancy score of 0 indicates no discrepancy between self- and informant reports. Negative discrepancy scores indicate higher self-report scores relative to informant-report scores. Conversely, positive discrepancy scores indicate higher informant report scores than self-report scores. A Significant sex differences in SRS-2A raw total discrepancy scores for probands. B Marginal effect of gender identity on SRS-2A raw total discrepancy scores for probands. C Significant effect of informant sex on SRS-2A raw total discrepancy scores for probands. D No significant sex differences in BRIEF-A GEC discrepancy for probands E No significant effect of participant gender on BRIEF-A GEC discrepancy for probands F No significant effect of informant sex on BRIEF-A GEC discrepancy for probands. *indicates p < 0.05 after correction for multiple comparisons using the Benjamini–Hochberg correction. SRS-2A  Social Responsiveness Scale-2 Adult, BRIEF-A  Behavioral Rating Inventory for Executive Functioning-Adult, GEC Global Executive Composite Score of the BRIEF-A

We also examined the effect of informant sex on discrepancy for exploratory purposes. When reporting on probands, male informants (M = −42.7, SD = 37.4) and female informants (M = −11.1, SD = 37.7) differed in SRS-2A discrepancy scores (F(1,77) = 9.89, p < 0.01) with a medium effect size of informant sex (η2 = 0.11; see Fig. 2C). Generally, SRS-2A discrepancy scores were greater in magnitude with male informants and were in the negative direction, indicating higher levels of self-reported symptoms by probands relative to informant-reported symptoms when informants were males. There were no significant effects of informant sex on discrepancy scores for the BRIEF-A among probands (F(1,90) = 3.90, p > 0.05; η2 = 0.04), the SRS-2A among family members (F(1,90) = 1.12, p > 0.05; η2 = 0.01), or the BRIEF-A among family members (F(1,88) = 0.91, p > 0.05; η2 = 0.01) (see Fig. 2F and Supplementary Fig. 2).

Discussion

We found a lack of agreement and inter-rater reliability between self-report and informant-report scores for the same measures for probands, yet moderate agreement and low inter-rater reliability between self-report and informant-report measures in their family members. Additionally, we found a pattern of negative discrepancy scores between self and informant-reporting of autism-related behaviors for female probands, such that female probands reported more autism-related behaviors for themselves than their informant did about them. In exploratory analyses, we found a difference in discrepancy in reporting autism-related behaviors of probands according to the sex of the informant. Specifically, SRS-2A discrepancy scores were of greater magnitude and in a negative direction with male informants, indicating higher levels of self-reported symptoms by probands relative to informant-reported symptoms when informants were males.

Our findings related to discrepancy in reporting autism spectrum traits build on work previously done in child samples finding parent–child reporting discrepancies (e.g. Lerner et al., 2012) but differ from the small number of previous conflicting reports in autistic adults. One previous study found good self/other agreement on the SRS2-A among autistic adults (80% male sample) (Sandercock et al., 2020), while another study (with a male-only sample) reported poor self/other agreement but in the opposite direction of what we observed in our sample, with men reporting having fewer ASD symptoms in relation to their informant (Cederlund et al., 2010). Much of the previous work across all age ranges examining the agreement between self- and informant-report measures have relied on predominantly male participants (> 70%) and have either found good agreement or the effect of lower reporting of ASD symptoms according to self-report (Cederlund et al., 2010; Johnson et al., 2009; Lerner et al., 2012; Sandercock et al., 2020; White et al., 2012). In contrast, our sample had a relatively high representation of female probands (46.3%). Informants’ lower reporting of autism spectrum traits in female probands in our sample potentially could be related to camouflaging of ASD behaviors by probands. This would be in line with camouflaging work showing that while autistic individuals across sexes and gender identities camouflage, there seems to be higher rates of camouflaging among women (e.g. Hull et al., 2017; Lai et al., 2017). It also may be due to sex differences in the expression of autism-related behaviors in males and females (e.g. Lai & Szatmari, 2020) leading to informants identifying fewer autism-related traits in women. The impact of gender on discrepancy cannot be fully investigated in the present study. Given the lack of enrichment of trans and non-binary individuals, our sample lacks the necessary statistical power to do so. Recent work has taken multiple approaches to examine the intersection of gender diversity and the autism spectrum (George & Stokes, 2018; Manjra & Masic, 2022; Moore et al., 2022; Strang et al., 2020; Warrier et al., 2020). Future studies into reporter discrepancy, in addition to topics related to autism more broadly, are needed with larger numbers of trans and non-binary individuals, which would enable investigators to assess both the effects of gender, as well as sex assigned at birth.

While female probands, on average, had discrepancies that were greater in magnitude than male probands, our data demonstrate that many probands – male and female – had large discrepancies between self-report and informant-report scores. This suggests that self-reports and informant-reports may be carrying different sets of information for autistic adults. Recent work looking at ASD behaviors from childhood to young adulthood suggests that self-reports may be especially important in adults (Riglin et al., 2021). Riglin et al. focused on identifying trajectories of change and / or maintenance of levels of ASD traits (2021). They found that, by age 25 years, there were parent-reported differences between trajectory groups but not self-reported differences, concluding that incorporating self-report assessment, as well as a variety of measures, may be important for accurately assessing ASD traits in autistic adults (Riglin et al., 2021).

The lack of agreement and inter-rater reliability between self and informant reporting of executive functioning using the BRIEF among probands is in contrast to the studies examining self/other agreement in some other populations, but our results are generally aligned with findings in autistic adolescents (Donders & Strong, 2016; Kenworthy et al., 2021; Rabin et al., 2006; Wilson et al., 2011). However, the levels of concordance for executive function found among autistic adolescents were higher in Kenworthy et al., 2021 than those found in the present sample. This may be due to the difference in the mean age of the two samples, and/or due to some methodological differences between the present study and Kenworthy et al., 2021, including the following: (1) the studies use different versions of the BRIEF (BRIEF-2 child form in Kenworthy et al. vs. BRIEF-A adult form in the current study), (2) the studies use different forms of the GEC score (T-score transformed in Kenworthy et al. vs. raw score in the present study), and (3) ICC calculation methods vary, with the Kenworthy et al. study using a two-way random effects model testing for consistency (which cancels out systematic rating errors) and using an average score (Kenworthy et al., 2021). In contrast, the present study was testing for absolute agreement (do the raters produce the same score) to assess the validity of a single score, based on the current field practices to rely on one reporting method for adults (either self-report or informant-report). Our results emphasize the importance of collecting both self- and informant-report information in order to capture the full expanse of autism spectrum related behaviors and abilities, including executive functioning.

There are many possible sources for the self- vs. informant-report discrepancies in autistic adults. Informants may lack understanding due to a neurotypical viewpoint, in line with the concept of dialectical misattunement between neurotypical and autistic individuals or the double empathy problem, i.e. the idea that social communication difficulties are not solely reliant on the autistic individual’s inherent social ability, but are also dependent on their neurotypical social partner (Bolis et al., 2017; Milton, 2012). The effect of informant sex in particular suggests that the interpretation of autism-related behaviors may be more difficult when the informant does not share social context with the self-reporting participant (i.e. the informant and the participant are of different sexes). Additionally, informants may lack awareness of traits/thoughts that are not easily observable, have bias towards over- or under-assignment of autism spectrum traits, or have other factors influencing how they report. On the other side, the individual self-reporting may actively camouflage their behaviors. The self-reporter may also possess greater or lesser degrees of self-awareness or have individual bias in the way they view themselves that could affect their self-reports (Huang et al., 2017).

Among family members in the present study, there was poor agreement between self and informant report measures as measured by ICC, which was significantly different from zero but lower than values found in previous samples estimating self/other agreement for parents of autistic individuals (De la Marche et al., 2015; Möricke et al., 2016). It may be that the variability in the types of family members studied (not just parents of autistic children, as in some previous studies) and variability in the relationship between the family member and their informant contributed to lower self/informant report agreement compared to previous work. An additional consideration in the assessment of self/informant report correlations in probands vs. family members is that it is possible that differences between probands and family members in phenotype variability could affect agreement, e.g. that lower variability (generally high SRS and BRIEF scores) in proband phenotypes might partially account for the lack of self/informant report correlations in probands. But because variability in phenotypes in probands was fairly robust in our sample (with higher SRS and BRIEF standard deviations seen in probands than in family members, as shown in Table 2), this does not seem to account for the findings in our dataset. Nevertheless, this is an issue that deserves further research in future samples, using one or more additional measures of autism-related behaviors.

A limitation of this study is the lack of diversity related to certain demographics – namely gender identity, race, and education level. While our proband sample had a variety of gender identities reported, not enough non-binary and transgender people were included to investigate an effect of gender on discrepancy. A more balanced sample in terms of education level (as a rough proxy for socioeconomic status) and racial identity is necessary to ensure generalizability of the results. Additionally, this sample also only included those with a verbal IQ above 70 and cannot address the reporting/phenotype collection challenges related to those with lower IQ and/or intellectual disability. Another limitation of this study is the lack of consistency in the informant’s relationship to the participant. While we secured informant reports from parents or other close family members whenever possible, this was not possible in all cases, as some participants had family members who were unavailable (e.g. uncomfortable with participating). Incorporating probands in the study who did not have a parent informant allowed for broader inclusion but likely added variability and inconsistency in the type of knowledge and experience each informant had with the proband. This challenge in securing an informant with a consistent relationship to the proband seems to be specific to research involving autistic adults (as opposed to research with autistic children in which a parent, caregiver, and/or teacher is often available) and is another reason to collect both self-report and informant-report data in adults. Additionally, as described in the Methods and further detailed in Supplementary Materials, the informants were not from an entirely independent group. Even with this overlap in participants and informants in which many informants also served as family member participants, the discrepancy between self-report and informant-report scores for probands observed (mean of −18.5 points for SRS-2A and −11.2 points for BRIEF-A) is still a concerning observation and an area for future consideration and study.

To extend this study’s findings regarding self-report vs. informant-report discrepancies, future studies should investigate possible contributions to these discrepancies, including camouflaging, potential biases when reporting on autism-related behaviors, degree of shared social context of probands and informants, and the impact of an informant’s general autism-related knowledge on their reporting. Future work should also look for possible differences in agreement / discrepancy across different domains of autism-related traits (such as perspective taking, social engagement, repetitive motor behaviors), as these domains will vary in the degree to which an informant is able to observe them. Additionally, future work should examine the influence of different domains of cognition and/or behavior on discrepancy (for instance, does higher perspective taking abilities of the autistic adult or the informant relate to decreased self/informant discrepancy). Given the known variation in autism-related traits across sexes, this may be an especially important avenue to help disentangle whether the group effects observed here are in fact due to sex, are a simplified gender effect, are measurement or sampling effects, or emerge from different distributions of cognitive and behavioral phenotypes across genders. Future research should also look at how self and other reports align with diagnostic histories (e.g., age of ASD diagnosis, experience of misdiagnosis) and treatment histories (eg type of treatment received, when treatment began, etc.). Finally, looking at agreement in self-informant reporting for multiple measures in each domain, as well as agreement of self- and informant-reported measures with clinician ratings and laboratory performance-based measures will be an important check on the generalizability of the results of the present study. The presence of the discrepancies in the present study suggests that it is vital to use both self-report and informant-report measures in future research studies and clinical assessments, as they carry different sets of information, both of which are important. Not collecting self-report information for autistic adults may lead to missing important information about their experiences and phenotype.

Citation Diversity Statement

Recent work in several fields of science has identified a bias in citation practices such that papers from women and other minority scholars are under-cited relative to the number of such papers in the field (Caplar et al., 2017; Dion et al., 2018; Dworkin et al., 2020; Maliniak et al., 2013; Mitchell et al., 2013). Here we sought to proactively consider choosing references that reflect the diversity of the field in thought, form of contribution, gender, race, ethnicity, and other factors. First, we obtained the predicted gender of the first and last author of each reference by using databases that store the probability of a first name being carried by a woman (Dworkin et al., 2020; Zhou et al., 2020). By this measure (and excluding self-citations to the first and last authors of our current paper), our references contain 27.94% woman(first)/woman(last), 18.33% man/woman, 27.06% woman/man, and 26.67% man/man. This method is limited in that a) names, pronouns, and social media profiles used to construct the databases may not, in every case, be indicative of gender identity and b) it cannot account for intersex, non-binary, or transgender people. Second, we obtained predicted racial/ethnic category of the first and last author of each reference by databases that store the probability of a first and last name being carried by an author of color (Ambekar et al., 2009; Sood & Laohaprapanon, 2018). By this measure (and excluding self-citations), our references contain 6.29% author of color (first)/author of color(last), 18.85% white author/author of color, 20.76% author of color/white author, and 54.11% white author/white author. This method is limited in that a) names and Florida Voter Data to make the predictions may not be indicative of racial/ethnic identity, and b) it cannot account for Indigenous and mixed-race authors, or those who may face differential biases due to the ambiguous racialization or ethnicization of their names. We look forward to future work that could help us to better understand how to support equitable practices in science.