Autism spectrum disorder (ASD) is a highly heterogeneous neurodevelopmental disorder affecting at least 1 in 59 children in the US (Baio et al. 2018), or one in 132 people worldwide (Baxter et al. 2015). Individuals with ASD exhibit impaired social interaction and communication, restricted interests, repetitive behaviors, and unusual sensory responses (American Psychiatric Association 2013). Current treatment options rely heavily on behavioral interventions (e.g., applied behavior analysis) that aim to foster learning and skill development as well as to manage maladaptive behaviors, while medications such as atypical antipsychotics, psychostimulants, guanfacine and other mediations that are unapproved for ASD such as selective serotonin reuptake inhibitors and anti-convulsants, are frequently employed to reduce associated behaviors of irritability/aggression and inattentive/hyperactive behaviors (Zwaigenbaum et al. 2015). However, core ASD deficits tend to persist despite intervention, and no pharmacotherapies possess proven efficacy at reducing the core symptoms of deficits in social interaction and communication, and restricted and repetitive behaviors (Ji and Findling 2015).

Drug development for the core social deficits of ASD faces various challenges: first, as for many neuropsychiatric disorders, a lack of surrogate markers (i.e., biomarkers) able to detect therapeutic efficacy is a key obstacle (Anagnostou et al. 2015; Baxter et al. 2015; Brugha et al. 2015; Zwaigenbaum et al. 2015, 2013). Common approaches available to quantify social communication deficits in individuals with ASD were not developed with the intent for use in ASD and are cumbersome and subject to bias, as they are based on caregiver report (Anagnostou et al. 2015). A second challenge unique to neurodevelopmental disorders like ASD concerns the fact that the initial evaluation of novel compounds usually takes place in clinical trials in adults, rather than in trials in the ultimate optimal target population of children and adolescents. However, deficits that are commonly described in children and adolescents with ASD in social cognition, for example, skills of empathy, imagination, theory of mind (TOM; beliefs, desires, intentions, and perspectives), social pragmatics and advanced language skills (Williams White et al. 2007) are known to show changes across different stages of development in longitudinal studies (Sarrett and Rommelfanger 2015). Thus, the profiles of abnormalities and corresponding surrogate markers for therapeutic efficacy may be different across the life-span. In addition, clinical heterogeneity of ASD is presented in a variety of symptom profiles, severity (Lai et al. 2013) and levels of intellectual and functional communication ability and constitutes a major obstacle both to the diagnosis and treatment of ASD (Charman et al. 2017; Jeste and Geschwind 2014; Masi et al. 2017). Furthermore, diagnostic scales used in ASD target relatively heterogeneous groups of behaviors and were not originally developed to sensitively assess social communication or more narrow components of social responsiveness in the context of a clinical trial. To date, results from contemporary investigations attempting to characterize and group ASD social and communication impairments and link them mechanistically to biologically proximal information-processing functions have been mixed; no single biomarker or cognitive domain has emerged as a primary thus far. Therefore, it is paramount to identify stratification factors that are easily assessed in a clinical setting and that reduce the autistic symptom variance. Overall, few studies have attempted to assess the discriminant properties, reliability and validity of putative markers of core deficits as treatment biomarkers with utility for clinical trial application.

The current work aims to assess the discriminant validity of several promising surrogate markers or social functioning in high-functioning adults with ASD and in healthy volunteers (i.e. observe if the direction of difference is as expected). The measures in the study were selected based on their ability to objectively evaluate different system levels of social cognition and communication in a multi-dimensional approach with the expectation that a fragmentation of social communication processes in ASD would allow for the identification of the measures that best relate to neurobiological or neurocognitive processes and to the disease and/or symptom severity. These measures included the eye‐tracking paradigms and olfaction, representing a basic level of screening, attunement to, and extraction of, socially relevant information and the Affective Speech Recognition test (ASR) and Reading‐the‐Mind‐in‐the‐Eyes Test (RMET) as an intermediate level corresponding to the ability to capture and process composite information critical for social communication. The results of this work will help to interpret data from multicenter clinical trials and to build a well-characterized battery of objective assessments from which to choose from for future clinical trials contingent on the mechanism of action and the expected pharmacodynamic effect of a drug.

As an exploratory objective, a post-hoc analysis evaluated the utility of one of these surrogates, the Sniffin’ Sticks Screening 12 olfaction identification test (Kobal et al. 1996), as a stratification factor and a predictor of deficits in social interaction and communication. The Sniffin’ Sticks Screening 12 olfaction identification test was chosen because olfaction plays an important role in social communication in humans (Hays 2003; Stevenson 2010; Wysocki and Preti 2004) and compared to other exploratory measures, it is the only one for which normative data to classify subgroups exists (Kobal et al. 1996). This test was also selected in the context of the development of the vasopressin antagonist RG7713 in the phase 1 clinical study NCT01474278 (Umbricht et al. 2016) given the evidence of high expression of V1a receptor (V1aR) in the ventral and lateral portion of the anterior olfactory nucleus, different structures of the olfactory bulb and an olfactory (piriform) cortex and presence of V1AR mRNA in endothelial cells of midline blood vessels between the main olfactory bulbs in rats (Ostrowski et al. 1994). In ASD, altered behavioral responses to social chemosignals have been reported (Endevelt-Shapira et al. 2018) implicating olfaction as a potential factor guiding neurodevelopment (Secundo et al. 2014). These reports also point towards a possible involvement for olfaction in abnormal processing of socially salient information and/or providing a biomarker indexing disruptions of the embryogenic development within critical time frames (Rozenkrantz et al. 2015). For these reasons, we assessed olfaction in high-functioning adults with ASD and healthy controls (HCs) and studied its relation to two fundamental aspects of social cognition: auditory and visual emotion recognition.

A companion manuscript describing the assessment of the concurrent validity of these exploratory assessments and the feasibility of implementation in a clinical study has recently been published (Del Valle Rubido et al. 2018) and helps to contextualize the findings of the present work. These exploratory measures showed varying associations across ASD severity, adaptive skills, and behavior, suggesting that each of the exploratory measures examined have the sensitivity to capture information that individually informs aspects of social functioning, but they appear to largely tap into functional differences that are at least partially independent.

Methods

Design

Two studies form the basis of this analysis: Study 1 (NCT01669889), a multicenter, observational study that enrolled 19 high-functioning adults with ASD or Asperger’s syndrome according to the Diagnostic and Statistical Manual-Fourth Edition (DSM-IVTR) (American Psychiatric Association 2000) and 19 HC; and Study 2, an intervention trial (NCT01474278) that enrolled 19 high-functioning participants with ASD or Asperger’s syndrome (Umbricht et al. 2016).

Study 1 consisted of a screening period (maximum 35 days, completing diagnostic, clinical, and functional measures) followed by the Day 1 study visit (alternatively, participants were offered to combine the screening and study visits in a single day, if eligible). During the Day 1 visit, participants were seen in the clinical research unit and were discharged following completion of all assessments (Fig. 1). HC found to be free of current psychopathology, completed assessments according to an identical schedule as participants with ASD, except for the measurements of clinical symptomatology at screening. Study 2 obtained data from baseline assessments prior to the administration of intravenous drug or placebo, which were combined with data from the participants with ASD or Asperger’s syndrome who participated in Study 1. No participants participated in both protocols. The same assessments were performed the same number of times and in the same order and schedule in both studies (Fig. 1).

Fig. 1
figure 1

Study design and schedule of assessments. Identical protocols were performed in Studies 1 and 2, assessments are listed in the order performed. Order and number of test administrations were determined by the pharmacokinetic characteristics of the investigational drug and the burden to participants in Study 2. Some participants opted to combine screening and day 1 visits into a single visit. Assessments of clinical symptomatology (i.e. ABC, ADOS, CGI-S, SCIT and VABS-II) were only assessed in the ASD population at the screening visit. Social Communication Interaction Test (SCIT) results are not addressed in this manuscript. ABC aberrant behavior checklist, ADOS autism diagnostic observation schedule, ASR affective speech recognition, CGI-S clinical global impression, RMET reading the mind in the eyes, SCIT social communication interaction test, STAI state/trait anxiety inventory, WASI-II Wechsler Abbreviated Scale of intelligence II

Inclusion Criteria

High-functioning (intelligence quotient [IQ] > 70) male patients (18–45 years old) with a diagnosis of ASD or Asperger’s syndrome (DSM-IVTR) (American Psychiatric Association 2000) by clinical evaluation by an experienced psychologist or psychiatrist and confirmed by scores obtained from the administration of the Autism Diagnostic Observation Schedule (ADOS) (Lord et al. 2002) by a trained clinician. Age-matched healthy males were enrolled in both studies. Full inclusion/exclusion criteria are presented in ESM Table 1 (online resource 1).

Table 1 Baseline characteristics of participants in Studies 1 and 2

Clinical and Functional Measures Administered

Wechsler Abbreviated Scale of Intelligence Version II (WASI-II) (Wechsler 2008)

A brief and reliable measure of adult intelligence, yielding estimates of verbal intelligence quotient (VIQ), performance intelligence quotient (PIQ) and full-scale intelligence quotient (FSIQ).

ADOS Module 4 (Lord et al. 2002) (ASD Group Only)

A validated, examiner-rated, structured instrument that systematically prompts assessment of social behavior and interaction for evaluation of ASD diagnoses. Scoring yields three domains: communication, social interaction, and the combined communication and social interaction score. Standard published scoring cutoff scores were applied; higher scores indicate greater levels of core deficits.

Aberrant Behavior Checklist-Community Version (ABC-C) (Aman et al. 1985)

An informant-rated, 58-item questionnaire normed for developmentally disabled populations. Ratings generate scores on five factors: irritability, lethargy and social withdrawal, stereotypic behavior, hyperactivity/non-compliance and inappropriate speech. Items are scored from 0 (no problem) to 3 (severe problem).

Vineland Adaptive Behavior Scale-II (VABS-II) (Sparrow 2011) (ASD Group Only)

A caregiver-rated, semi-structured interview to assess adaptive behavior and skills in developmental disorders. Measures adaptive behavior across subscales of communication, daily living skills, and socialization. A composite score is generated as a measure of overall functioning. Higher scores correspond to better adaptive behavior skills.

Clinical Global Impression-Severity (CGI-S) (Guy 1976) (ASD Group Only)

A clinician-rated measure of overall illness severity. Ratings are made on a seven-point scale ranging from one (normal) to seven (among the most extremely ill patients).

Autism Spectrum Quotient (AQ) (Baron-Cohen et al. 2001a, b) (Study 1 Only)

A self-report measure for normally functioning individuals, assessing autistic traits. It includes 50 items that assess communication, social skills, imagination, attention to detail and attention switching. Higher AQ scores indicate more autistic traits and cognitive characteristics; in surveys, AQ scores greater than 32 are highly discriminating between individuals with ASD compared with non-affected individuals.

State/Trait Anxiety Inventory-State (STAI) (Spielberger 2010)

A measure of state (i.e. situational) and trait (stable, constitutional) anxiety, consisting of 20 self-completed items scored on a Likert scale of one (not at all) to four (very much so). Higher STAI scores indicate greater anxiety.

Exploratory Assessments

Eye Tracking

Eye tracking has been shown to be a robust technique for studying processes associated with social impairments in individuals with ASD and for quantifying differences in looking behaviors between individuals with ASD and HCs (Chita-Tegmark 2016; Frazier et al. 2017). A Tobii T60XL 60-Hz eye-tracking system was used to measure gaze positions of participants during the following paradigms:

(1) Activity monitoring (Shic et al. 2011; Frederick Shic et al. 2014): Participants viewed videos of two actresses performing simple social activities involving joint play with visually salient distracters in the background. Dependent variables included the ratio of time spent looking at heads, bodies, activities, and backgrounds relative to total time spent looking at the scenes.

(2) Biological motion preference task (biomotion) based on Annaz et al. 2012 (Annaz et al. 2012): Participants watched multiple trials of various side-by-side videos containing, in pseudorandomized left–right order, dynamic point-light displays of a human actor’s performance of an activity (e.g., walking, jumping a rope, waving: biomotion condition, i.e., biological motion) (Carnegie Mellon University Graphics Lab) alongside computer-generated animations of moving dots (control condition). Dependent variables included ratio of looking to biomotion over looking at either biomotion or the control condition; and proportion of times participants oriented to biomotion first compared with control first.

(3) Biological motion detection (biodetection) (Kaiser et al. 2010): Under masked and unmasked conditions, participants were shown videos of biological or mechanical movement. Participants were asked to identify whether a human being was walking or not. The dependent variable was d-prime, a measurement of signal detection ability.

(4) Complex dynamic social task (Klin et al. 2002): Participants were shown a series of short clips from the movie Who’s Afraid of Virginia Woolf (WAVW), which displayed scenes with complex social interactions with high emotional tone. Dependent variables included attention (% of total looking time) to: eyes of the actors; mouths of the actors; the bodies of the actors; and background regions including objects.

(5) Gaze and (6) Gender discrimination in a static face scanning task (Andari et al. 2010): Participants viewed a series of static photographs of human faces with instructions to either identify gaze direction (direct/averted) or gender (male/female). Primary dependent variables included attention (percentage of total looking time) to facial region and non-facial regions. Secondary dependent variables included time spent looking at the mouth and eyes of the presented faces.

(7) Human activity preference task (Pierce et al. 2011): In this preferential looking task, two videos were presented simultaneously, side-by-side, on a single screen, with one video depicting a human performing an activity (Human) and the other video a computer-generated, continuously moving/changing geometric pattern (Geometry). The left–right ordering of videos was counter-balanced. The dependent variable was attention (percentage of looking) to Human compared with Geometry. This paradigm is referred to as the GeoPref task by the paradigm designers (Pierce et al. 2015), however, for simplicity and consistency with the outcome measure we refer to this paradigm as the Human Activity preference task.

A composite score was defined a priori by one of the current authors (FS), derived from the key parameters of all tasks, with weights based loosely upon results collected in a separate pilot study conducted prior to the current study. This composite score was intended to capture orientation and attention to socially relevant information, with lesser emphasis on the human activity preference and WAVW tasks and least emphasis on biological motion detection ability. Trials were considered valid if they contained more than 70% valid gaze looking time collected during the stimulus presentation, and/or exhibited a calibration uncertainty of less than 1.5 degrees in the trial, or less than 3 degrees over the entire session (Shic 2008). A participant’s session data were considered valid for a paradigm if it contained 50% or more valid trials.

Pupillometry (Anderson et al. 2013)

Change in pupil diameter during the performance of eye-tracking paradigms was integrated into the apparatus. Differences in tonic pupil size indicate level of arousal in response to stimulus presented (Laeng et al. 2012) and children with ASD have been shown to have larger tonic pupil sizes than HCs (Anderson and Colombo 2009). In our study, we did not consider dynamic changes, only tonic pupil sizes.

Reading the Mind in the Eyes Test-Revised (RMET) (Baron-Cohen et al. 1997; Baron-Cohen et al. 2001a, b)

A measure of TOM and facial affect processing, RMET scores have been found to discriminate patients with ASD from typical controls (Baron-Cohen et al. 2001a, b). Participants are presented with 36 different pictures of the eye region of human faces. Participants then have to select one of four different emotion labels that describe the emotion the person is feeling (Baron-Cohen et al. 2001a, b). Higher scores indicate greater TOM and detection of emotion.

Affective Speech Recognition (ASR) (Hollander et al. 2007)

A measure of emotion recognition ability, determined by accuracy of identification of affective speech. Participants are played a recording of four sentences of neutral content with eight different emotional intonations (angry, disgusted, fearful, happy, lustful, neutral, sad and surprised). Each intonation is repeated six times for a total of 48 sentences. Participants have to select the appropriate emotion from a list (Hollander et al. 2007). Higher scores indicate better emotion recognition. ASR performance has been described as impaired in patients with right temporoparietal lesions (Heilman et al. 1975), and ASR scores showed possible improvements after intravenous oxytocin compared with placebo administration in adults with ASD (Hollander et al. 2007).

Olfactory Measure Sniffin’ Sticks Screening 12 test (Hummel et al. 2007; Kobal et al. 1996)

Participants are asked to smell twelve Sniffin’ Sticks (pen-like devices containing different odors), one at a time, and to select the correct odor descriptor from a selection of four possible choices. One point is scored for each correct answer (maximum score 12). Participants were instructed not to eat, to drink only water and to avoid chewing gum or smoking cigarettes at least fifteen minutes before the test. A brief history was collected regarding the participant’s current allergies and nasal congestion in order to ensure validity in the test results. All participants were considered evaluable by the investigators at the time of testing. Based on their scores, participants were grouped into normal (normosmic) and impaired (< 10 points) according to normative data, and further subgrouped into hyposmic (< 10– > 4 points) and anosmic (< 4 points) (Kobal et al. 1996).

Statistical Analysis

All statistical analyses were conducted with SAS software (SAS Institute Inc., Carey, NC USA) and R (R Foundation for Statistical Computing, Vienna, Austria). An analysis of covariance model was used to estimate mean differences between ASD and HC adjusting for age and FSIQ. The FSIQ, a comprehensive and broadly used measure of general cognitive and intellectual functioning also frequently used as part of the diagnosis of intellectual disabilities, was selected as covariate. These differences and their standard errors were estimated from the models. The ANCOVA model used in the analyses assumes equal variances between the two groups. For eye tracking and pupillometry measures, an additional random subject effect was introduced into the model in order to account for correlations between the two-repeated measurements. Cohen’s f-squared was used as a measure of the effect size for group differences. For comparisons of normosmic vs. hyposmic ASD subjects an ANOVA model with a group as a fixed effect was used. To maintain an experiment-wise error rate of alpha of 0.05 when doing multiple comparisons, we used the Bonferroni correction. The adjusted threshold for significance is alpha of 0.00094. Due to the exploratory nature of the analyses, p-values should be interpreted as descriptive measures of trend, rather than determinants of statistical significance and with caution.

Ethics

Both studies were conducted in accordance with the principles of the Declaration of Helsinki and Good Clinical Practice. The study sites were the Albert Einstein College of Medicine, Bronx, NY, USA; the UCLA Semel Institute CAN Clinic, Los Angeles, CA, USA; and the Child Study Center at Yale University School of Medicine in New Haven, CT, USA. Study protocols were reviewed and approved by the institutional review boards of each institution.

Results

Baseline characteristics for all participants are shown in Table 1. Recruitment was intended to include IQ-matched HCs, however, although mean age was similar, mean IQ was higher in the HC group (Table 1). Median FSIQ score was 116.0 in the HC group (range 101.0–140.0) and 100.0 in the ASD group (range 71.0–136.0).

Between-Group Comparisons

Baseline Assessments of Clinical Symptomatology: AQ and STAI

From the assessments performed in both ASD and HC groups, the AQ was the assessment which showed the greatest effect size between ASD and HC (ASD group [least squares mean] 29.72; HC group: 13.01; Δ = 16.71; f2 = 1.3; p < 0.001; t = 6.66; df = 34). However, no differences were observed in the overall level of anxiety measured by the STAI total score (ASD group 38.69; HC group: 31.09; Δ = 7.60; f2 = 0.062; p = 0.076; t = 1.81; df = 53).

Exploratory Assessments

Eye Tracking

Based on the criteria for valid trials, 86.2% of trials from the ASD group and 85.0% of trials from the HC group were valid. No between-group differences in trial acquisition were present in any eye-tracking outcome measure.

Results showed differences (ASD-HC) in the expected direction for activity monitoring, biomotion, human activity preference (social compared with geometry) and composite score (Fig. 2, Table 2). In activity monitoring, participants with ASD spent less time looking at the head (Δ = –0.11; [90% CI − 0.16 to − 0.07]; f2 = 0.65; p < 0.0005; t = 4.04; df = 48); and the person (Δ = –0.09; [90% CI − 0.14 to − 0.04]; f2 = 0.52; p = 0.005; t = 2.94; df = 48) than HC (Table 2). In the biomotion task, participants with ASD showed less preference for biological motion compared with control stimuli (Δ = –0.09; [90% CI − 0.15 to − 0.03]; f2 = 0.35; p = 0.02; t = − 2 = − 2.11; df = 50) (Table 2). In the human activity preference tasks, participants with ASD showed less preference for human activities as compared with geometric shape videos (Δ = − 0.19; [90% CI − 0.30 to − 0.08]; f2 = 0.32, p < 0.01; t = 2.8; df = 48). Finally, the composite score capturing key parameters across eye-tracking tasks was lower for participants with ASD compared with HC, indicating a general deficit affecting attention to socially relevant information (Δ = − 0.48; [90% CI − 0.86 to − 0.11]; f2 = 0.3, p = 0.04; t = 2.14; df = 49 = 49).

Fig. 2
figure 2

Eye tracking results for participants with ASD and HCs. P-values less than 0.00094 are considered statistically significant after multiplicity adjustment. Data are estimated mean ratios ± 90% confidence intervals for the amount of time spent looking at a specific area of interest and the total amount of time looking at the whole screen for each paradigm. ASD autistic spectrum disorder, HC healthy control

Table 2 Eye tracking data for ASD and HC groups

Pupillometry

The ASD group had larger pupil sizes than the HC group during all seven eye tracking paradigms: activity monitoring (Δ = 0.35 mm; [90% CI 0.09 to 0.60]; f2 = 0.16; p = 0.03; t = 2.25; df = 50), d-prime (Δ = 0.33 mm; [90% CI 0.02 to 0.64]; f2 = 0.13, p = 0.09; t = 1.74; df = 49), biomotion preference (Δ = 0.33 mm; [90% CI 0.06 to 0.59]; f2 = 0.2; p = 0.05; t = 2.01; df = 49), gaze discrimination (Δ = 0.36 mm; [90% CI 0.10 to 0.63]; f2 = 0.17; p = 0.03; t = 2.25; df = 50), gender discrimination (Δ = 0.39 mm; [90% CI 0.13 to 0.64]; f2 = 0.2; p = 0.02; t = 2.45; df = 50), human activity preference (Δ = 0.29 mm; [90% CI 0.03 to 0.55]; f2 = 0.17; p = 0.07; t = 1.85; df = 49) and WAVW (Δ = 0.39 mm; [90% CI 0.11 to 0.67]; f2 = 0.22; p = 0.03; t = 2.26; df = 50) tasks (Table 3).

Table 3 Between-group comparisons of pupillometry data

RMET and ASR

Participants with ASD showed similar performance on both the RMET and ASR total scores compared with HCs; RMET: ASD group 67.7%; HC group: 74.7% (Δ = − 6.98; [90% CI − 16.52 to 2.56]; f2 = 0.045,; p = 0.224; t = − 1.24; df = 34); ASR: ASD group 55.6% correct answers; HC group: 57.3% correct answers (Δ = − 1.67 [90% CI − 8.78 to 5.44]; f2 < 0.01; p = 0.70; t = − 0.39; df = 52). In the ASR, the largest differences were observed for the individual emotions disgust (ASD group 60.5%; HC group: 72.6%; Δ = − 12.13 [90% CI − 25.05 to 0.79]; f2 = 0.052; p = 0.12; t = − 1.57; df = 48), and happiness (ASD group 51.7%; HC group: 62.5%; Δ = − 10.80 [90% CI − 22.11 to 0.52]; f2 = 0.05; p = 0.12; t = − 1.6; df = 51). Interestingly, Participants with ASD identified the emotions fearful (ASD group 57.5%; HC 45.7%; Δ = 11.80 [90% CI –0.24 to 23.85]; f2 = 0.056; p = 0.11; t = 1.64; df = 48) and surprise (ASD group 63.9%; HC group: 54.2%; Δ = 9.77 [90% CI − 0.82 to 20.36]; f2 = 0.048; p = 0.13; t = 1.55; df = 50) more easily than HCs. However, none of these differences were statistically significant.

Olfaction

We first assessed the difference in smell identification scores between the two groups. No differences were observed between participants with ASD and HCs in the mean percentage of correct answers on the Sniffin’ Sticks Screening 12 olfaction identification test (ASD group: 84.8%; HC group: 90.2%; Δ = − 5.35 [90% CI − 13.06 to 2.37]; f2 = 0.04; p = 0.25; t = − 1.17; df = 34). Next, as a post-hoc analysis, we explored the potential of olfaction as a stratification factor by grouping participants according to their olfactory status. The olfactory status was defined by a threshold of 10 points on the Sniffin’ Sticks Screening 12 olfaction identification test based on normative data (Kobal et al. 1996). Seventeen HCs and 22 ASD participants were classified as normosmic; two HC and 16 ASD participants as hyposmic and anosmic. For this analysis, the two hyposmic HCs were excluded, since a sample size of two was not large enough to allow reliable comparisons with the other three groups [HC normosmic, ASD normosmic, ASD hyposmic]. Compared with HCs, we observed a significant higher rate of ASD participants with impaired olfaction (Fisher’s exact test: ASD 42.1%; HC 10.5%; p = 0.018; 95% CI [1.16–61.04]; sample estimates: OR 6.01). Impaired olfaction also identified meaningful subgroups in terms of IQ, adaptive skills, social functioning, and social cognition (Table 4).

Table 4 Differences between normal and impaired olfaction ASD groups

Finally, we evaluated the association between olfactory status and the outcomes on the different assessments. When compared with participants with ASD with normal olfaction, the ASD participants with olfaction dysfunction showed reduced emotion recognition ability on two tasks of ToM: lower accuracy on the RMET (Δ = − 19.87 [90% CI − 31.29 to − 8.46]; f2 = 0.38; p < 0.01; t = − 2.54; df = 17) and on the ASR overall (Δ = − 14.91 [90% CI − 23.88 to − 5.93]; f2 = 0.367; p < 0.01; t = − 3.58; df = 35) driven mainly by a lower identification of the individual emotions -disgust (Δ = − 31.35 [90% CI − 48.22 to − 14.48]; p = 0.004) and surprise (Δ = − 18.65 [90% CI − 33.30 to − 4.01]; p = 0.039). The normosmic ASD subgroup showed performances comparable to those of the normosmic HCs on the percentage correct scores of ASR (Δ = 1.48 [90% CI − 5.64 to 8.40]; p = 0.73) and RMET (Δ = 7.86 [90% CI − 1.20 to 16.91]; p = 0.151). ASD participants with impaired olfaction showed a lower FSIQ (Δ = − 9.01 [90% CI − 18.63 to 0.60]; f2 = 0.111; p = 0.12; t = − 2.0; df = 36) influenced by a lower verbal IQ (Δ = − 14.58 [90% CI − 24.56 to − 4.60]; f2 = 0.026; p = 0.02; t = − 0.97; df = 36), and more irritability (ABC irritability subscale) (Δ = 6.96 [90% CI 1.12 to 12.81]; f2 = 0.049; p = 0.05; t = 1.33; df = 36) (Table 4) when compared with ASD participants with normal olfaction. Olfactory impairment did not influence the performance on eye tracking variables, VABS-II, or ADOS subdomain scores.

Discussion

This study sought to identify discriminant properties of putative surrogate markers relating to social dysfunction in adults with ASD. Measures differentiating participants with ASD from HCs were pupillometry, quantifying arousal during task performance, and three of seven eye tracking paradigms, (preference for heads in activity monitoring, preference for biological motion compared with synthetic movements, and preference for videos of human movements compared with geometric shape videos). However, looking at the head during the activity monitoring task of the eye tracking was the only measures that survived Bonferroni correction, and no group difference in pupil size remained significant. Our findings are consistent with the majority of ASD literature, which relies heavily on studies of younger subjects. Our report is unique in that a few studies have applied such an extensive and broad battery of potential surrogate markers of ASD in adults, with the potential exception of consortia focused on this topic (EU-AIMS) (Loth et al. 2017), The Autism Biomarkers Consortium for Clinical Trials (ABC-CT), (Foundation for the National Institues of Health 2018), InFoR-Autism) (Fondation Fondamental 2018) and industry efforts i.e. JAKE® (Ness et al. 2017). One of the most novel findings relates to our identification of a high proportion of adults with ASD with evidence of impaired olfaction. Although it could be argued that the data may be skewed, since there appears to be a ceiling effect in the HC more than the ASD group, this 'compresses' the normal score. Hence a difference to an overall population of ASD displaying a larger variability is more difficult to demonstrate. A dichotomization by the olfactory status offers a solution. Among the general population, the prevalence of olfactory impairments seems to be age-related and has been reported to be between 19 and 22% in individuals between 16 and 55 years of age (Bramerson et al. 2004; Hummel et al. 2007; Vennemann et al. 2008). Doubling the normal rates, 42% of the participants with ASD showed olfaction dysfunction in our study. Despite small samples, differing olfaction test procedures, and non-standard scoring, it is notable that impairments in identification of odorants as well as differences in the rating of intensity and pleasantness/unpleasantness have been reported in adults with Asperger’s syndrome (Suzuki et al. 2003) and ASD (Wicker et al. 2016). Nevertheless, negative results have been reported as well: One used fewer stimuli and did not score per convention (Addo et al. 2017). Another study identified no differences in olfaction detection thresholds or adaptation to continued stimulus presentation in adults with ASD but did not test for accuracy (Tavassoli and Baron-Cohen 2012). However, our results were confirmed in the recently completed phase 2 study VANILLA (NCT01793441) (data on file, Roche) (Bolognani et al. 2019) in which the same olfaction test was assessed at baseline in 191 high-functioning male adults with ASD and 48.17% showed olfaction dysfunction. Taken together, our data coupled with prior reports provides significant support for an increased prevalence of olfaction dysfunction in ASD.

There is an increasing recognition that olfactory problems may be predictive of social impairment in children with ASD (Kumazaki et al. 2018; Lane et al. 2010; Hilton et al. 2007). Olfaction identification scores have been moderately correlated with reciprocal conversation skills (r = − 0.56) and social chatting scores (r = − 0.44) from the Autism Diagnostic Interview-Revised test (Bennetto et al. 2007) and taste/smell sensitivity has been identified as a predictor of maladaptive behaviors (r = − 0.53) measured by the VABS (Lane et al. 2010). Rozenkrantz et al., observed a significant association between sniff response to odor valence and the social affect component of the ADOS in children, together with an association between olfaction and FSIQ, thereby suggesting a mechanistic link between the response to olfactory stimuli and ASD through impaired sensory-motor systems that modulate social communication (Rozenkrantz et al. 2015). In the assessment of concurrent validity of this study, we observed that reduced olfaction was associated with worse emotion recognition ability on both RMET (r = 0.54) and ASR (r = 0.40), possibly indicating greater impairments in their TOM capacity, as well as communication deficits in the ADOS communication domain (r =  − 0.34) and in the inappropriate speech subscale of the ABC (r = − 0.32). Olfaction identification also correlated with the VIQ (r = 0.47) and FSIQ (r = 0.40) (Del Valle Rubido et al. 2018). These subanalyses did not control for differences in IQ, thus, the contribution of IQ differences to these associations is unknown. However, given the links between olfaction and the development of social cognition and the fact that olfactory identification also relies upon intact orbitofrontal cortical (OFC) functioning, further research is warranted to clarify both the potential of olfaction as a biomarker for social deficits in ASD and the underlying biological mechanisms.

Our findings are not surprising, as olfaction has been established as a critical element in affective matching after the age of 5 years in typically developing children (Cavazzana et al. 2016). It plays a key role in bonding (Bowlby 1980; Sullivan et al. 2011; Wedekind and Penn 2000) and highly influences interpersonal relationships (Huttenbrink et al. 2013). Research has already identified olfaction as an indicator of neuronal, social and cognitive development (Rozenkrantz et al. 2015), and it may also be a marker for severe central nervous pathology affecting social communication (Amaral et al. 2008; Huttenbrink et al. 2013). Krajnik et al. suggested a relationship between olfactory dysfunction and interoceptive awareness (Krajnik et al. 2015). Recent research has also drawn attention to the association between interoceptive abnormalities and ASD (Barttfeld et al. 2012; Elwin et al. 2012; Fiene and Brownlow 2015; Garfinkel et al. 2016; Hatfield et al. 2017; Noel et al. 2018) as well as other psychiatric disorders characterized by emotional impairment (Furman et al. 2013; Pollatos et al. 2009; Stevens et al. 2011). Interoception could also be associated with the sensory processing abnormalities found in ASD which are now an important aspect of the ASD diagnosis criteria per the Diagnostic and Statistical Manual of Mental Disorders, 5th Edition (American Psychiatric Association 2013). Correlations between olfactory dysfunction, sensory processing and interoception in ASD remain yet to be further elucidated.

Notably, another consistent finding were the larger pupil sizes during the eye-tracking assessments in the ASD group compared with the HC group, with moderate to large effect sizes (0.60 to 0.85) suggesting a dysregulated autonomic arousal in response to environmental stimulus as a prominent phenotype in ASD (Anderson and Colombo 2009; Kushki et al. 2013; Hirstein et al. 2001; Anderson et al. 2013; Corbett et al. 2010). We did not assess pupillometry using standardized stimuli (e.g. flashes of light, as seen in (Nystrom et al. 2015)) or baseline pupil measurements outside of the eye tracking experiments. In addition, this study was not designed to test pupil response but rather to provide a straightforward comparison of pupil sizes during tasks. Therefore, it is unknown whether the larger pupil sizes are a baseline characteristic, a reaction to the task or to specific social or non-social stimuli within each task. Increased tonic pupillary size noted in children with ASD with evidence of lower sympathetic tone, (Anderson et al. 2013), and lower electrodermal activity and responses (Kushki et al. 2013) support the position of abnormal autonomic nervous system response in pathophysiology of ASD. In our correlation analysis of these measures, pupillometry was mostly unrelated to ASD severity and core social deficits, except for the biomotion task that correlated with the ADOS total score (r = − 0.33) and the communication subdomain of the Vineland (r = 0.36). However, larger pupil size was consistently related to lower behavioral ratings of hyperactivity on the ABC (r values ranging from − 0.36 to − 0.43) and higher FSIQ (ranging from 0.38 to 0.45) and PIQ scores (ranging from 0.36 to 0.44) while the VIQ remained unrelated (Del Valle Rubido et al. 2018). Thus, in ASD, better functioning is associated with larger pupil sizes.

Pupil dilation is known to be modulated by the brain’s locus coeruleus-norepinephrine system (Rajkowski et al. 1993), which controls physiological arousal (Samuels and Szabadi 2008) and cognitive functioning (Ramos and Arnsten 2007; Sara 2009) and has been used as a measure of subjective task difficulty, mental effort, and neural gain (Eckstein et al 2017). As a reflection of greater arousal or effort while engaged in task performance, pupil size may indicate the ability to better marshal effortful attention during the eye tracking as a sign of greater cognitive or inhibitory control and prove its utility in studying this separate important dimension of co‐occurring inattentive and disruptive behavior symptoms in ASD (McCracken 2011) or intellectual disability. While this explanation would appear to be inconsistent with the between-group difference observed, where ASD participants were shown to have larger pupil sizes than HCs, it is important to note that all eye-tracking tasks presented were fundamentally implicit or explicit tasks of social cognition. It is possible that HCs required less effortful attention to complete these tasks due to an inherently greater facility in social information processing. Another possibility is a proposed model of chronic autonomic nervous system hyperarousal in ASD, which describes chronic biological threat response, forwarded by Patriquin et al. based on a review of cardiac literature in adults and children with ASD (Patriquin et al., 2019; Edmiston et al. 2016; Guy et al. 2014; Bal et al. 2010; Van Hecke et al. 2009; Ming et al. 2005; Denver 2004). Based on the Polyvagal Theory (Porges 1995), Patriquin et al. suggest a potential difference of the information flowing from the brain to periphery in individuals with ASD due to differences in the neuroception of safety versus threat, resulting in greater autonomic hyperarousal in ASD. Latent hyperarousal differences between ASD and HCs could explain between-group pupil size differences observed, with this effect modulated by differences in autonomic flexibility observed between individuals with ASD with and without intellectual impairment (Van Hecke et al. 2009; Cohen and Johnson 1977; Goodwin et al. 2006; Miller and Bernal 1971; Palkovitz and Wiesenfeld 1980; Sigman et al. 2003; Sheinkopf et al. 2013). Although, the precise determinants of increased pupillary size in ASD remain to be clarified, pupillometry could also be informative for subject stratification efforts, depending on intervention.

Extending the results of previous work, we demonstrated atypical gaze patterns in eye tracking in the activity monitoring, biological motion preference and human preference tasks (Annaz et al. 2012; Chawarska et al. 2013; Frederick Shic et al. 2014) in the ASD group. However, no differences were observed in the remaining four out of seven eye-tracking paradigms (biodetection, WAVW, gaze discrimination and gender discrimination). This contrasts with results of many previous studies in younger subjects which showed that the best predictor of autism was reduced eye region fixation time (Auyeung et al. 2015; Klin et al. 2002). Moreover, despite the association found between looking at the mouth and social communication skills (Del Valle Rubido et al. 2018), there was no difference in fixation in the mouth between groups.

The failure to replicate previous eye tracking findings may be explained by several factors: firstly, potential under-reporting of negative and inconclusive results, because of the dearth of studies investigating eye pattern differences in adults with and without ASD or subgroups within the ASD population (Zamzow et al. 2014); divergent eye gaze patterns may depend on the nature of the stimuli presented (dynamic or static, real-life and naturalistic or non-naturalistic, social or non-social) (Hanley et al. 2015; Hanley et al. 2013; Speer et al. 2007; Manyakov et al 2018). More likely, however, is the possibility that high-functioning adults with ASD might ultimately succeed in reaching the developmental level of neurotypicals with overall minor differences in eye gaze patterns (Baez et al. 2012; Ullman and Pullman 2015) by developing compensatory mechanisms, or implementation of strategies to read faces (Bauminger 2002; Dawson et al. 2005; Hwang and Hughes 2000) and/or detect biological motion.

Our study also showed little relationship between eye tracking measures, adaptive behaviors measured by the Vineland, other measures of social perception and olfaction. Nonetheless, small to moderate correlations were found between activity monitoring, WAVW, and gender discrimination tasks with the severity of ASD symptoms and behavior measured by the ADOS and ABC (Del Valle Rubido et al 2018). Of all these tasks, the only paradigm for which there were consistent findings between correlation results (Del Valle Rubido et al. 2018) and the between-group differences highlighted here were in looking at the people in Activity Monitoring (greater looking at people associated with lower autism symptom severity in ASD, and less looking at people, especially the head, in ASD as compared to HCs). Associations with phenotype within ASD and ASD-HC between group differences were in an opposite-to-expected direction for human activity preference, with poorer adaptive communication associated with greater human looking within ASD, but less looking at the human versus geometric shape observed here in ASD as compared to HC. Other tasks showed significant findings for one of either correlations (Del Valle Rubido et al 2018) or between-group comparisons, but not both. These patterns highlight the complexity of straightforward extensions of between-group comparisons of ASD and HC groups to relationships within ASD. Factors which may impact the directionality and strength of effects could include reduced dynamic range within the ASD or HC groups, comorbid psychiatric features such as anxiety or depression in ASD, as well as fundamentally different mechanisms impacting social scene gaze patterns within ASD as compared to across groups, similar to that for which we have forwarded for pupil size relationships. Further studies are necessary to clarify these complex relationships.

Perhaps somewhat surprising was the lack of group differences observed between the ASD and HC groups for two measures, the RMET and the ASR, contrary to prior studies (Baron-Cohen et al. 2001a, b, 2015; Holt et al. 2014; Kaland et al. 2008). A review by Sivaratnam et al. found inconsistent reports of ToM impairments in structured test settings in high-functioning ASD groups (Happe 1995; Bauminger 2002), in contrast to clear impairments revealed in naturalistic test settings (Rump et al. 2009; Dziobek et al. 2006) and in everyday functioning (Rieffe et al. 2000). Suggesting that paradigms measuring ToM in non-naturalistic social settings may not provide an accurate pattern of functioning in ASD groups (Sivaratnam et al. 2015; Adolphs 2001; Klin 2000; Baron-Cohen et al. 1985; Leslie and Frith 1990; Weeks and Hobson 1987). Klin et al. (2003) theorized that due to the differences in learning, individuals with ASD may develop compensatory strategies which help them score well on standardized tests. Yet, difficulties may remain when the applying the cognitive potential and the appropriate set of social skills in naturalistic contexts (Klin et al. 2007, 2003). Our findings also reflect this contradiction. On the one hand, despite the lack of group differences, both the ASR and RMET demonstrated significant relationships with each other (r = − 0.64) but neither did they correlate with the ADOS communication and reciprocal social interaction domains. On the other hand, both the ASR and RMET correlated with the Inappropriate Speech subscale of the ABC (ASR r = − 0.66, RMET r = − 0.52) and the ASR with the Vineland communication subdomain and the adaptive behavior composite score (r = 0.46 and r = 0.40 respectively). It remains unclear whether the lack of group differences despite existing correlations between the ASR and RMET and the Vineland and ABC is due to the non-naturalistic test setting. In addition, the difference in how the concepts of socialization and communication are measured with the various clinical assessments (symptomatology/ disability in ADOS vs. ability in Vineland (Klin et al. 2007) and problematic behaviors in ABC) could be an additional confounding factor to be taken into consideration. When looking at the individual emotions in the ASR, ASD participants did not identify disgust and happiness as easily as healthy controls, whereas they were able to identify fearfulness and surprise. This over-responsiveness for fearfulness and surprise observed in with the ASD group, is perhaps indicative of higher levels of anxiety or a lack of understanding and inappropriate expression of emotions in ASD (Shields et al. 1994; Sigman et al. 1992). A plausible mechanism for the higher level of anxiety could be an increased activation of subcortical brain regions (i.e., amygdala) involved in the processing of fearful faces differs in subjects with ASD compared with HCs in functional magnetic resonance imaging (Kleinhans et al. 2015, 2011). These findings in ToM warrant further research to understand the underlying mechanisms. The higher level of complexity and effort required of both the RMET and ASR compared to passive viewing of faces in the eye tracking and pupillometry, may have led to the lack of differences.

Limitations

The included studies both enrolled a relatively small sample size of all-male, high-functioning adults, limiting the generalizability of these findings. Participants with ASD were required to have ABC irritability subscale scores ≤ 13 and to undergo a 2-h infusion in Study 2, which may further limit applicability to lower-functioning, more severe and disruptive ASD phenotypes. The time required to complete the assessments was long (8 h), which may have created substantial burden of cognitive load on the participants. It can also be argued that the ASD and HC groups, while matched on age, were not well matched on other potential confounding factors. For instance, the race characteristics of the participants in ASD and HC groups were different i.e., a preponderance of Caucasian participants in the ASD group) and socioeconomic aspects were not considered. These fundamental differences between groups limit the applicability of these results to broader ASD populations. The analyses performed are exploratory. These include the post-hoc analysis of the ASD population by olfaction status based on the odor identification subtest of the Sniffin’ Sticks Screening 12 olfaction identification test. As such, between-group differences and p-values should be interpreted with caution and used as a guidance for selection of assessments in future studies. Finally, despite the selection of Screening 12 version of the Sniffin’ sticks test for its convenient administration in everyday clinical practice, other versions of the Sniffin’ Sticks test may allow a more precise testing and therefore an improved characterization of olfactory performances.

Conclusions

Our results suggest a potential use of specific eye tracking tasks, pupillometry and olfaction tests for stratification and response sub-analyses outcome-prediction in ASD trials. They also highlight the fact that abnormalities reported in young individuals with ASD may no longer be present to the same extent or with the same profile in adults with ASD. This points towards the view that the profile of abnormalities and hence characteristics of potential markers may change with development. The eye tracking, activity monitoring, biological motion, human activity preference and pupillometry tasks differentiated the best between paticipants with ASD and HCs. Our results implicate olfaction as a factor in the development of social cognition. It may be a simple and useful assessment for characterization of disease severity and for stratification in clinical trials. However, replication is needed for confirmatory purpose, and additional research should clarify sensitivity to change and links to functional outcomes.