The Social Shapes Test as a Self-Administered, Online Measure of Social Intelligence: Two Studies with Typically Developing Adults and Adults with Autism Spectrum Disorder

Brown, Matt I.; Heck, Patrick R.; Chabris, Christopher F.

doi:10.1007/s10803-023-05901-2

The Social Shapes Test as a Self-Administered, Online Measure of Social Intelligence: Two Studies with Typically Developing Adults and Adults with Autism Spectrum Disorder

Original Paper
Published: 09 February 2023

(2023)
Cite this article

Download PDF

Journal of Autism and Developmental Disorders Aims and scope Submit manuscript

The Social Shapes Test as a Self-Administered, Online Measure of Social Intelligence: Two Studies with Typically Developing Adults and Adults with Autism Spectrum Disorder

Download PDF

Matt I. Brown ORCID: orcid.org/0000-0002-4258-1961^1,2,
Patrick R. Heck¹ &
Christopher F. Chabris¹

1789 Accesses
3 Altmetric
Explore all metrics

Abstract

The Social Shapes Test (SST) is a measure of social intelligence which does not use human faces or rely on extensive verbal ability. The SST has shown promising validity among adults without autism spectrum disorder (ASD), but it is uncertain whether it is suitable for adults with ASD. We find measurement invariance between adults with (n = 229) or without ASD (n = 1,049) on the 23-item SST. We also find that adults without ASD score higher on the SST than adults with ASD (d = 0.21). We also provide two, 14-item versions which demonstrated good parallel test-retest reliability and are positively related to scores on the Frith-Happé task. The SST is suitable for remote, online research studies.

Understanding Social Anxiety Disorder in Adolescents and Improving Treatment Outcomes: Applying the Cognitive Model of Clark and Wells (1995)

Article Open access 13 April 2018

Development and Validation of the Camouflaging Autistic Traits Questionnaire (CAT-Q)

Article Open access 25 October 2018

“Putting on My Best Normal”: Social Camouflaging in Adults with Autism Spectrum Conditions

Article Open access 19 May 2017

Social and emotional skills are important for adaptive functioning in everyday life (Soto et al., 2020). Clinical researchers have developed an array of psychological assessments to measure these skills (e.g., Abrahams et al., 2019) and to explain difficulties in social responsiveness or behavior observed in various psychological conditions including autism spectrum disorder (ASD; Morrison et al., 2019) and schizophrenia (Pinkham et al., 2018). Cognitive neuroscience researchers have also used these tools to identify brain areas associated with social functioning (Schaafsma et al., 2015; Schurz et al., 2021). Due in part to the COVID-19 pandemic, however, many clinical and research practices have needed to shift from administering in-person assessments to using measures that can be administered remotely during telemedicine visits or online studies (Türközer & Öngür, 2020). This shift to online, non-proctored assessment has already become the predominant mode in employment testing (e.g., Tippins 2015) prior to the pandemic. In contrast, many of the assessments used in clinical or developmental research are designed for in-person, proctored administration. Few studies to date have documented the psychometric properties of social intelligence measures in clinical and typically developing samples (e.g., Gourlay et al., 2020; Pinkham et al., 2018), and there has been little research to determine whether any of these instruments are suitable for remote, online administration.

Although recent work has helped establish new, web-based cognitive ability assessments (e.g., Biagianti et al., 2019; Liu et al., 2020; Wright, 2020), few studies have focused on designing remote measures of social intelligence. This is important given that social intelligence is generally affected by ASD or similar, developmental disorders (Velikonja et al., 2019). Presently, many of the social intelligence measures that can be administered online rely on self- or observer-reports such as the Social Responsiveness Scale (SRS; Constantino et al., 2003), the Autism Quotient (AQ; Baron-Cohen et al., 2001b), or the Broad Autism Phenotype Questionnaire (BAPQ; Hurley et al., 2007). In contrast, many of the validated, performance-based social intelligence tests are designed to be completed in-person and administered by a proctor or trained clinician. This makes them ill-suited for remote administration and presents a challenge given the growing need for remote, web-based assessments. To this end, the Social Shapes Test (SST; Brown et al., 2019) was developed as a simple, self-administered social intelligence test based on the animated shape task created by Heider and Simmel (1944). To date, however, the SST has only been validated for use with adults without ASD. Therefore, we conducted the present study to examine whether the SST is appropriate for use as a remote, performance-based social intelligence test for adults with ASD.

We consider the SST, along with other existing animated shape tasks, to be measures of social intelligence (SI). We define SI as the ability to perceive and decode the internal states, motives, and behaviors of others (Mayer & Salovey, 1993; Lievens & Chan, 2010). This operational definition overlaps with those for constructs commonly studied in autism research like mentalizing and Theory of Mind (ToM; Luyten et al., 2020). Some scholars have recently expressed concern regarding the accumulation of narrowly defined social and emotional abilities and the potential for jingle and jangle fallacies (Olderbak & Wilhelm, 2020; Quesque & Rossetti, 2020). An example of this concern is that a task in which individuals are asked to identify mental states from pictures of human faces (e.g., the Reading the Mind in the Eyes test; Baron-Cohen et al., 2001) has been variously characterized as a measure of Theory of Mind, mentalizing ability, empathic accuracy, face processing, and emotion recognition across different studies (Oakley et al., 2016). Therefore, we use the more inclusive term SI given its long history in psychological research and its broader use across research fields relative to other terms (e.g., Theory of Mind which is more specific to developmental research, or mentalizing, which is more specific to social cognitive neuroscience).

Measuring Social Intelligence Using Animated Shape Tasks

The original animated shape task was developed by Heider & Simmel (1944) who famously observed that research participants often described the movements of simple animated, geometric shapes in human psychological terms. This pioneering work inspired several streams of research where scholars sought to identify individual differences in Theory of Mind or mentalizing ability using the original film or newly created shape animations. Klin (2000) used the original Heider and Simmel film to create the Social Attribution Task (SAT). In this task, individuals were shown the film and asked to provide written responses to 17 questions about the events in the film (e.g., “What happened to the big triangle”). Each question was asked after participants viewed specific segments of the film. These responses were scored by human raters based on the use of specific kinds of terms indicating concepts such as emotions, mental states, or behaviors. Klin reported that individuals with autism or Asperger’s disorder made fewer social attributions compared to individuals without ASD as indicated by using fewer mental or emotional state terms, mentioning fewer personality features of the shapes, and difficulty identifying the social meaning of the shapes’ movements. Likewise, SAT scores have been found to predict the severity of ASD-related social symptoms among a sample of children with ASD but average general intelligence (Altschuler et al., 2018). Researchers have observed modest test-retest reliability for SAT scores. Most recently, Altschuler and Faja (2022) reported stronger reliability for spontaneous ToM and cognitive ToM scores but slightly weaker reliability for affective ToM scores. A modified version of the SAT has also been used in neuroimaging research to identify differences in activation of brain regions related to social information processing between individuals with and without ASD (e.g., Vandewouw et al., 2021).

The Frith-Happé animation task is similar to the Social Attribution Task but consists of 12 short films with each featuring two animated triangles (Abell et al., 2000). In each film, the movements of the triangles are meant to depict interactions involving mental states, only physical, goal-directed interaction, or purposeless movement. A recent meta-analysis studies which have used the Frith-Happé animations (k = 33 studies) found that individuals with ASD are less able to correctly categorize animations which are designed to depict mentalizing compared to animations containing only goal-directed or random movement (Wilson, 2021). In addition, the Frith-Happé animations have also been used to identify similar difficulties in social attribution among adult patients with schizophrenia (Martinez et al., 2019).

Although most studies have focused on mental state attributions from written responses to these animated shape stimuli, some scholars have adapted these tasks into a multiple-choice test format. A 19-item, multiple choice version of the Social Attribution Task (SAT-MC) was designed by Bell and colleagues (2010). This test uses the same film as the SAT but replaces the narrative responses with targeted multiple-choice questions which are scored as either correct or incorrect. Performance on the SAT-MC has been found to be positively related to performance on other social cognition tasks including the Bell-Lysaker Emotion Recognition Task and the Mayer-Salovey-Caruso Emotional Intelligence Test (Bell et al., 2010). Adults with schizophrenia have also been found to perform significantly worse on the SAT-MC compared to a group of healthy controls (Johannesen et al., 2018; Pinkham et al., 2018). SAT-MC scores were also found to be positively related with social skills as assessed by a standardized role-playing task. In addition, the test has also displayed promising validity in autism research where a recent pilot study by Burger-Caplan et al. (2016) found that children with an ASD diagnosis scored 0.87 standard deviations worse on the test compared to healthy controls.

Similar to the SAT-MC, White and colleagues (2011) designed a multiple-choice task using the Frith-Happé animations. In this version of the task, individuals are asked to correctly categorize each film as demonstrating either theory of mind, physical interaction, or random movement. Performance on this task has been found to positively correlated with performance on other social intelligence tasks while also displaying modest group score differences favoring IQ-matched, typically developing adults (Brewer et al., 2017). This task has also been administered as an online task completed by adults with and without ASD diagnoses in recent research (Livingstone et al., 2021). However, the multiple-choice version of the Frith-Happé animation task has only been used in seven of the 33 studies identified by Wilson (2021).

One benefit of these various animated shape tasks is that they rely less on reading skill or verbal knowledge and comprehension compared to other measures of SI. For example, tasks like the Faux Pas (Baron-Cohen et al., 1999) or the Hinting task (Corcoran et al., 1995) require reading and interpreting written descriptions of social interactions. Other tasks like the Reading in the Mind in the Eyes Test (RMET; Baron-Cohen et al., 2001a) require knowledge of words used to describe emotional or mental states which are not commonly used in everyday language (Kittel et al., 2022; Peterson & Miller, 2012). These tasks may confound social intelligence with verbal ability where some individuals could use their verbal skills to compensate for low SI (Livingston & Happé, 2017). Another advantage to animated shape tasks is that they are abstract and do not include any obvious cultural or gender cues. These cues, such as those present in emotion recognition tasks which only use faces of White or Caucasian individuals, can result in mean test score differences due to race or ethnicity among clinical and nonclinical populations (Dodell-Feder et al., 2020; Pinkham et al., 2017). In contrast, animated shape tasks have displayed little if any racial or ethnic group differences in past research which makes them potentially suitable for studies involving international samples (Brown et al., 2019, 2022; Lee et al., 2018). However, several of the existing animated shape tasks are not well-suited for remote testing. For example, the original SAT and Frith-Happé animations, require a clinician or administrator to ask questions and to record and score verbal responses from participants. Not only could this introduce confounding effects of verbal ability or rater bias, but it also increases administration time and financial costs (Livingston et al., 2019). Although more recent versions of these tasks use a fixed set of multiple choice questions (e.g., SAT-MC), an administrator is still needed in order to operate the specific video segments for each question. This is problematic because it prevents participants from being able to complete these tasks remotely which likely prevents some researchers from using these tasks as studies increasingly shift from being conducted in-person to online. Other alternative versions of these tasks are primarily designed for neuroimaging studies and are also not well-suited for brief, online assessment (Ludwig et al., 2020).

The Social Shapes Test (SST)

The SST is a 23-item multiple choice test designed to measure individual differences in social intelligence among neurotypical adults. Each SST item consists of a short, 13–23 s animated video which includes a standard set of colored, geometric shapes. Each video features a different social plot where the shapes display a variety of behaviors including bullying, helping, comforting, deceiving, and playing. Some animations were designed to mimic the bullying behavior which appears in the original Heider and Simmel video. Others were designed to represent false belief tasks. In sum, these animations have been found to elicit a similar degree of social attributions in written descriptions compared to those reported by Klin in a prior study (Ratajska et al., 2020). In this study, Ratajska et al. (2020) scored narrative descriptions for each of the SST videos using Klin’s (2000) Theory of Mind indices and found that the range of scores for SST items overlapped with those reported for the original Heider and Simmel film. All videos are controlled by the participant and can be viewed as many times as desired. Before starting the SST, participants are given the following instructions:

“In this task, you will see a series of short, silent, animated videos. The shapes in these videos can be interpreted as people interacting with each other.

First, please watch each video carefully and completely. After watching the video, select the best answer to the multiple choice question listed below the video. Make sure to answer all of the questions to the best of your ability.

Next is a practice item. Please watch the video and try your best to answer the question. Note that you are allowed to replay a video as many times as you want while answering the question. Please do not expand the videos to full screen.”

Next, all participants are given a sample item followed by feedback indicating the correct response (Fig. 1). All 23 items are subsequently administered in the same order for all participants.

Unlike other SI tasks, the SST was explicitly designed to be completely self-administered online, as was done in initial validation studies (Brown et al., 2019). All questions are scored using an objective scoring key which helps prevent potential rater bias when scoring open responses which are used in other animated shape tasks (White et al., 2011). Like the SST, an updated version of the Frith-Happé animations was developed for remote, online administration (Livingston et al., 2021). Although versions of the Frith-Happé animations have been found to detect differences in social intelligence between neurotypical adults and adults with ASD, they have rarely been administered to large samples of typically developing adults. Lastly, all SST questions and video files are freely available for research use and can be accessed via the Open Science Framework (https://osf.io/sqxy6). Researchers are free to use the SST videos to administer the test as part of an online survey or to adapt the videos in order to suit their own individual studies. This makes the SST more easily accessible for researchers, especially compared to other video-based social intelligence tests owned or distributed by commercial test publishers (e.g., The Awareness of Social Inference Test – TASIT; McDonald, 2012). The video content in the SST is also relatively short (each animation ranges between 13 and 23 s in length) which helps minimize administration time compared to other, video-based measures of social intelligence (e.g., Movie for the Assessment of Social Cognition; Dziobek et al., 2006).

The SST is also unique in that that it was originally developed and validated using samples of undergraduate college students and crowdsourced participants from Amazon Mechanical Turk (MTurk) who were not selected for prior history or diagnosis of ASD. In these studies, the SST has demonstrated modest internal consistency (α > 0.65) and promising convergent validity with other performance measures of social intelligence. Among MTurk workers, SST scores were found to be positively related to emotion recognition ability as assessed by the RMET (r = .47). Individuals who scored higher on the SST were also more effective at identifying the correct emotion or mental state based on written scenarios in the Situational Test of Emotional Understanding (r = .48; Brown et al., 2019). In a subsequent study of undergraduate psychology students, those who scored higher on the SST were better at identifying the best behavioral solutions to interpersonal workplace situations in a situational judgment task (r = .40; Brown et al., 2022). These relationships remained even after controlling for differences in more general cognitive abilities (e.g., verbal or spatial abilities) and educational attainment. Despite these promising results, however, it is uncertain whether the SST can adequately assess differences in social intelligence among adults with ASD or other developmental disorders.

Present Study

We designed the present study to investigate whether the SST is appropriate to be self-administered remotely to measure social intelligence among adults with ASD. Our first aim is to test for measurement invariance of the SST between adults with or without ASD. Our second aim is to collect further validity evidence for the SST by testing whether unaffected, typically developing adults score higher on the test compared to adults who have been diagnosed with ASD. Based on similarities in test content (e.g., use of similar, geometric shape animations) and existing convergent validity evidence from typically developing adult samples, we expect that the SST and other animated shape tasks measure a similar, underlying social intelligence construct. Therefore, we expect that adults with ASD should score lower on the SST compared to adults without ASD as observed in prior research using similar animated shapes tasks like the Frith-Happé animations or the SAT-MC (Burger-Caplan et al., 2016; Livingstone et al., 2021; Wilson, 2021). We also conducted a second study to gather further reliability and validity evidence for two, alternate 14-item forms of the SST and to compare performance on the SST with scores on an existing animated shape task (the Frith-Happé animation task; White et al., 2011).

Study 1

Methods

Participants

Participants in Study 1 included of a variety of adults with or without a prior diagnosis of autism spectrum disorder (ASD). We recruited 261 participants who self-reported a diagnosis of ASD, autistic disorder, or Asperger’s disorder from the Simons Foundation Powering Autism Research for Knowledge (SPARK; SPARK Consortium, 2018). This cohort consists of individuals with ASD and their first-degree relatives. All of these individuals who were recruited for this study presently live independently and did not have a record of cognitive impairment when they joined the SPARK cohort. A broader description of adults in the SPARK cohort was recently reported by Fombonne et al. (2020). All SPARK participants were given a $10 Amazon gift card for completing the study. Although diagnosis history was collected using either self- or parent-reports, rather than direct clinical evaluation, past research has found that this method yields reliable accounts of autism diagnoses in other research registries (Daniels et al., 2012).

To account for the lack of clinical data for the independent adult ASD sample recruited via SPARK, we also recruited a second sample of 25 adults who had previously received a clinical diagnosis of ASD from a neurodevelopmental clinic. All of these 25 individuals had sought clinical services in the Northeastern U.S. and had consented to be contacted for ongoing research studies. Due to the smaller pool of eligible participants compared to the SPARK cohort, these participants were given a larger reward of a $35 Amazon gift card for completing this study. All participants were recruited online and completed the SST without a proctor or administrator. Participants ranged from 18 to 34 years of age (mean age = 20.8, SD = 3.9). Most participants identified as male (20/25; 80%) and as White, non-Hispanic (22/25; 88%). Based on assessment scores obtained from the electronic medical record, the average full-scale IQ score among the ASD group was 86.1 (SD = 22.4). T-scores from the Social Responsiveness Scale (SRS; Constantino et al., 2003) indicated an elevated level of autistic symptoms among most of the participants in clinical ASD group as well (M = 74.2, SD = 12.4).

We also recruited adults without ASD for this study. One group of adults without ASD was recruited from SPARK; these individuals were parents of one or more children with an ASD diagnosis who themselves had never received a diagnosis (SPARK parent; n = 217). Although these adults did not report any history of ASD, they may be at a greater genetic risk for ASD compared to the general population. Therefore, we also relied on data collected from adult participants in two prior studies (Brown et al., 2019, 2022) for a comparison group of adults without ASD. Unlike the SPARK parents, we assume that the adult participants from prior studies were not likely to share a potential genetic predisposition to ASD or to other developmental disorders given the relatively low rate of ASD in the general population. There was also no focus on ASD or developmental disorders in either of the prior studies from which these participants were recruited. A total of 829 participants were recruited from undergraduate psychology courses at a public university in the Midwestern U.S. and from Amazon’s Mechanical Turk. Most of these participants identified as female (59%) and White, non-Hispanic (56%). All prior study adult participants had completed the SST as part of a self-administered, online Qualtrics survey.

Data Cleaning

Prior to data analysis, we removed participants who had a median response time of less than 10 s per SST item. This response time threshold was chosen to increase the likelihood that participants watched the entire video for each item and to remove potential cases of non-purposeful responding (all 23 SST videos were 13 s in length or longer). We observed that a greater proportion of the adults without ASD were removed based on our response time threshold (21%) compared to SPARK participants with ASD (3%). A total of five participants were removed from the clinical sample. We also removed participants who failed to respond correctly to all four attention-check items. In each such item, participants were asked to watch a different shape animation and to identify which of four shapes did not appear in the video. The attention-check items depend only on basic cognitive processes (e.g., vision, attention, memory) and should not require social intelligence to solve. More than 90% of participants with or without ASD were able to correctly identify the missing shape in all four items. This left us with a total sample of n = 1,275 participants (ASD n = 229; SPARK parent n = 217; without ASD n = 829). We provide a full summary of the key demographic variables for each group in Table 1.

Table 1 Study 1 Participant Demographics

Full size table

Procedure

All study materials were presented in an online survey which was accessed via a link sent using email. Participants were given a brief set of instructions and a practice item before beginning the SST. Afterwards, participants completed several demographic items regarding their geographical location, educational attainment, self-identified race/ethnicity, and approximate annual income. Participant sex and age for participants recruited via SPARK was provided by the SPARK consortium. All SST scores were calculated as the simple sum of correct responses across the 23 items.

Statistical Analysis

All analyses were performed using R version 3.6.3. We tested for measurement invariance using confirmatory factor analysis models estimated using the lavaan package (Rossel, 2012). We also used multiple linear regression in order to statistically control for demographic differences between the three diagnosis groups. We report the standardized mean difference (Cohen’s d) when interpreting differences in SST scores based on ASD group where negative values indicate lower scores compared to adults without ASD.

Results

In order to investigate our first aim and determine whether the SST functions similarly for adults with or without ASD, we tested for measurement invariance between participants with or without ASD. It is important to rule out measurement invariance in order to determine whether the observed score differences between groups are due to true differences in the construct of interest and not a result of differences in the test’s measurement properties (Vandenberg & Lance, 2000). We focused on metric invariance which tests whether the primary factor loadings for test items are equal across different groups. This is tested by first specifying a single-factor, confirmatory factor analysis model where all 23 SST items load on to a single factor. Factor loadings were then estimated for all adults without ASD (participants from prior studies and SPARK parent groups combined). Next, this model is estimated using response data from the ASD group while constraining each item loading to be equal to the estimate from the group without a prior diagnosis. Constraining these factor loadings to be equivalent did not significantly reduce overall model fit, ∆χ² (22) = 30.05, p = .12. These results were further supported by comparable estimates for internal consistency of the SST separately for participants with ASD (α = 0.72) and without ASD (α = 0.67). Item difficulties (percent correct) within the SST were also highly consistent between groups (r = .97, p < .001). This indicated that the items which were most difficult for adults with ASD were also most difficult for adults without ASD. Based on these results, we conclude that the SST provides an equivalent measure of social intelligence for adults regardless of ASD diagnosis. We report the psychometric properties and descriptive statistics for the SST within each group in Table 2. Therefore, any subsequent differences in SST scores between groups can be attributed to differences in social intelligence ability and not differences in test functioning between the two groups.

For our second aim, we tested for differences in SST scores between participants with and without ASD using linear regression. Given the heritability of ASD, we also tested for differences between adults recruited for prior SST studies and parents of an affected child (SPARK parents). We first regressed SST scores on dummy-coded diagnosis variables representing adults with ASD and SPARK parents (Model 1 in Table 3). A statistically significant regression coefficient for either dummy-coded variable indicates a meaningful difference in SST scores compared adults without ASD from prior studies. Participants with ASD did score significantly lower on the SST relative to adults without ASD from prior studies (β = –0.08, p = .006, d = − 0.21, 95% CI = [–0.35, − 0.06]). We provide a histogram illustrating this difference in SST scores in Fig. 2. SPARK parents also scored lower on the SST compared to adults without ASD but this difference was not statistically significant (β = –0.03, p = .22, d = − 0.09, 95% CI = [–0.25, 0.06]). However, we observed several differences in the demographic makeup across our three comparison groups which may affect these observed test scores (Table 1). In particular, SPARK parents reported greater educational attainment and were older than the other two groups on average. Adults without ASD from prior studies were younger on average, reported greater educational attainment compared to adults with ASD, and were less likely to identify as White, non-Hispanic. Therefore, we also tested for differences in SST score after statistically controlling for participant age, educational attainment, and race/ethnicity. Participants with ASD still scored lower on the SST after controlling for these demographic differences (β = –0.10, p < .001; d = − 0.27). In addition, we observed a significant difference between adults without ASD and SPARK parents when holding age, race/ethnicity, and education constant (β = –0.07, p < .001; d = − 0.20). Among the demographic control variables, educational attainment was positively related to SST scores (β = 0.15, p < .001) and participants who identified as White scored higher on the SST, compared to all others (β = 0.10, p < .001). Age was not a significant predictor of SST scores in this regression model.

Table 2 SST Descriptive Statistics by Diagnosis Group in Study 1

Full size table

Table 3 Differences in SST Scores between Diagnosis Groups in Study 1

Full size table

Study 2

Although the 23-item SST typically takes roughly 15 min to complete, this may be too long for some studies involving a series of different assessments and measures. Prior studies have developed shorter versions of commonly used SI tests (e.g., short-form Reading the Mind in the Eyes Test; Olderbak et al., 2015) to accommodate researchers who wish to include SI performance measures without making the length of the study discouraging or prohibitive to prospective participants. Therefore, we conducted Study 2 to develop an abbreviated form of the SST which could be used in situations where a shorter administration time is needed while retaining the psychometric properties of the full 23-item version. We also conducted this study in order to estimate the test-retest reliability of this shorter version of the SST. The 23-item version has also only displayed modest internal consistency with estimates ranging between α = 0.67 (Brown et al., 2019) to 0.72 among adults with ASD in Study 1. However, some scholars have argued that internal consistency underestimates the reliability of tests when item content is heterogeneous (Neubauer & Hofer, 2022). We aim to estimate the reliability of the SST as the test-retest reliability between two alternate forms as demonstrated for the SAT-MC by Pinkham et al. (2017).

A second goal of Study 2 is to gather further validation evidence for the SST by observing convergent validity with another animated shape task. As noted earlier, the Frith- Happé animation task was recently evaluated for remote, online use in research studies (Livingstone et al., 2021). Although this task uses similar shape animations as featured in the SST, there are some differences between the two measures. A prior version of the Frith-Happé task included questions where participants needed to identify mental or emotional states of specific shapes (White et al., 2011), but most of the research using the tool has only administered questions where participants only categorize the content of each video. In contrast, the SST features more focal shapes in each video (four or five shapes compared to only two in the Frith Happé task) and all SST videos involve social interactions. This potentially allows for greater granularity in measuring differences in social intelligence relative to the four Theory of Mind items in the Frith-Happé task. We include both measures in Study 2 in order to estimate the correlation between the two tasks and to observe whether each task can predict incremental variance in performance on a separate, video-based social intelligence test after controlling for individual differences in general intelligence.