Introduction

As highly social animals, humans are tasked daily with navigating complex social interactions. In order to succeed in these interactions, we often rely on social cognitive processes that allow us to understand the targets of social interaction, including the ability to perceive and empathize with others’ emotions (Barrett et al., 2011; Singer & Klimecki, 2014). In particular, theory of mind (ToM), or mentalization, describes peoples’ ability to recognize and understand the mental states of other people (Gallagher & Frith, 2003). ToM has been positively correlated with social competence (Bosacki & Wilde Astington, 1999; Liddle & Nettle, 2006) and negatively associated with aggressive tendencies (Meier et al., 2006; Mohr et al., 2007). ToM also is positively correlated with social cooperativeness (Paal & Bereczkei, 2007) and social network size, such that individuals with better ToM abilities reported larger networks of friends compared with individuals with less-developed ToM abilities (Liddle & Nettle, 2006; Stiller & Dunbar, 2007). Collectively, these findings suggest the importance of ToM in facilitating successful social interactions.

Although nearly all people are capable of demonstrating ToM to some degree, research on individual differences in ToM can show how variations in ToM ability influence real-world social outcomes. For example, poor ToM performance has been a predictor in mental health research; people with autism (Baron-Cohen et al., 1985; Pinkham et al., 2008) and schizophrenia (Harrington et al., 2005; Pedersen et al., 2012; Pinkham et al., 2008)—two disorders linked with social impairments such as the inability to form appropriate relationships with others (Sasamoto et al., 2011)—have been found to have poor ToM ability. Poor ToM ability has also been correlated with a lack of understanding of how one’s actions affect other people, and with difficulty in accurately assessing others’ intentions (Baker, 2003).

The current literature on individual differences in ToM ability is limited. What does exist is largely siloed into subfields, such as developmental or clinical psychology. A large portion of such work focuses on variations in the development of ToM among children (Bowman et al., 2017; Devine & Hughes, 2013; Wang et al., 2016) using false-belief tasks that do not capture the full complexity of ToM (Altschuler et al., 2018; Apperly, 2012; Tager-Flusberg, 2011). Much of the remaining research focuses on ToM deficits in those with mental illness or developmental disabilities (Baron-Cohen et al., 1985; Dahlgren et al., 2010; Kerr et al., 2003; Pedersen et al., 2012), but such studies often are limited in their sample size and methods of assessing ToM ability. Additionally, a majority of research on ToM uses only single tasks to measure this ability, likely failing to capture the breadth of skills and abilities encompassed by ToM (Altschuler et al., 2018; Apperly, 2012). Considering the scarce research on variation among healthy adults, further research on individual differences in ToM ability could help to elucidate the underlying causes for variation in ToM and how this variation might relate to functional outcomes in daily life, eventually paving the way for more effective identification and intervention for those with deficits in these abilities (regardless of whether they meet criteria for specific psychological or developmental disorders).

Theory of Mind and the Default Network

Investigating the neural substrates of individual differences in ToM is one potentially useful approach to increasing scientific understanding of ToM. ToM has been consistently linked to brain regions such as the dorsal medial prefrontal cortex (dmPFC) and temporoparietal junction (TPJ; Allen et al., 2017; Amodio & Frith, 2006; Carrington & Bailey, 2009; Frith & Frith, 2006; Sabbagh et al., 2004; Saxe & Kanwisher, 2003; Saxe & Powell, 2006; Saxe & Wexler, 2005; Schurz et al., 2014; Spunt & Lieberman, 2012; Vogeley et al., 2001; Young et al., 2010). These regions of the brain are included in what is called the default network. Originally conceptualized as the brain’s default mode because it was found to be more active when people were merely at rest in the MRI scanner rather than engaged in tasks involving outwardly-directed attention, the default network is now thought of as a network broadly involved in internal simulation, and tasks have been identified that activate it specifically (Andrews-Hanna et al., 2014; Smith et al., 2009). Functions of the default network appear to include simulating the mental states of others and simulating one’s own experience during memory, prospection, or fantasy (Allen et al., 2017; Blain, Grazioplene, et al., 2020a; Mars et al., 2012; Meyer, 2019; Nettle & Liddle, 2008; Schilbach et al., 2008, 2012; Schurz et al., 2014; Seitz et al., 2006; Tamir et al., 2016). It is worth noting that the so-called “social brain” encompasses additional regions beyond the default network that are involved in social cognition and social interaction (Adolphs, 2009; Brothers, 1990; Frith & Frith, 2010). Other structures, such as the anterior cingulate cortex and insula, appear to be more involved in affective empathy and processes that allow us to vicariously experience and detect the personal relevance of others’ emotions (Bernhardt & Singer, 2012; Jackson et al., 2006). For the purposes of this study, we focused our investigations on the default network, which has been most widely implicated in research on the neural correlates of various kinds of social cognition, and especially theory of mind (Andrews-Hanna et al., 2014; Buckner et al., 2008; Mars et al., 2012; Meyer, 2019; Schilbach et al., 2008, 2012).

Three subsystems of the default network have been identified: 1) core, 2) dorsal medial, and 3) medial temporal subsystems (Figure 1; Andrews-Hanna et al., 2014). Although integrated into a single larger functional network, these three subsystems show some degree of functional specialization, with the dorsal medial subsystem exemplifying the strongest specific associations with ToM tasks (Allen et al., 2017; Buckner et al., 2008; Spreng & Andrews-Hanna, 2015). Nonetheless, the broader default network appears to be important for social cognition and ToM. For instance, the core subsystem of the default network, although primarily active when dealing with personally relevant information and self-related processes, includes several brain regions that have been more specifically implicated in social cognitive processing, including the posterior cingulate cortex (PCC), anterior medial prefrontal cortex (amPFC), and the angular gyrus (Andrews-Hanna et al., 2014; Hyatt et al., 2015; Yeo et al., 2011). Perhaps least related to ToM and social cognitive processing, the default network’s medial temporal subsystem is typically associated with autobiographical thoughts and memories, though it also has shown links to the overall default network functions of mental simulation and imagination (Spreng & Andrews-Hanna, 2015). As mentioned, the dorsal medial subsystem is most closely associated with social cognition and contains some of the regions most studied in research on the social brain (i.e., the dmPFC and TPJ), but it is worth mentioning that this subsystem also appears to play an important role in language comprehension generally (Spreng & Andrews-Hanna, 2015). In light of existing research on the role of the default network and its subsystems, we can expect that the default network broadly, but the dorsal medial subsystem in particular, will be active during social processing tasks and that individual differences in the function of these networks might underpin individual differences in social cognitive abilities.

Fig. 1.
figure 1

Default Network Subsystems in the Human Connectome Project. Note. DN = Default Network. The above figure displays the three default network subsystems identified by Yeo et al. (2011).

Theory of Mind and Agreeableness

When studying individual differences, it is useful to make connections with the broad personality models that attempt to identify the major domains of psychological variation. The most thoroughly validated and widely used model of personality is the Five Factor Model or Big Five, which describes the major dimensions of covariation among human personality traits (John et al., 2008). One of the Big Five, Agreeableness, which describes traits related to altruism and cooperation, has been associated with variations in ToM ability (Allen et al., 2017; Nettle & Liddle, 2008) and thus provides a particularly useful context for understanding individual differences in social cognitive abilities and associated neural networks. People with high levels of Agreeableness tend to be described as compassionate, polite, and kind, whereas those with low levels of Agreeableness are described as disagreeable, antagonistic, and callous (Allen et al., 2017; Krueger et al., 2012; Laursen et al., 2002). Agreeableness has been shown to correlate positively with many of the same beneficial social outcomes as ToM ability (Allen et al., 2017; Ozer & Benet-Martinez, 2006), indicating the importance of further exploring the relation between ToM and Agreeableness.

Because Agreeableness is one of the traits most strongly related to individual differences in interpersonal behavior (DeYoung et al., 2013; Graziano & Eisenberg, 1997; Koole et al., 2001), better elucidation of Agreeableness and its associated cognitive mechanisms (including social cognitive processes, such as ToM) could allow us to better predict and understand variation in interpersonal behavior and relationship functioning. This research also has the potential to contribute to theoretical models of personality. Until recently, much more emphasis has been placed on the characterization, rather than explanation, of variation in personality, and this is particularly true for Agreeableness (Nettle & Liddle, 2008). Examining ToM and social cognitive ability as one potential correlate of variation in Agreeableness, and examining the relation of both constructs to underlying variation in the default network, would contribute to neurocognitive accounts attempting to explain individual differences in Agreeableness and associated interpersonal outcomes (Allen & DeYoung, 2017; DeYoung & Weisberg, 2018; Xiao et al., 2019).

Utility of Latent Variable Modeling

To justify claims regarding the underlying associations among constructs—for example, Agreeableness, social cognition, and default network function—we must first be able to assess each of those constructs individually, in a way that is reliable and valid. Concerns of reliability and validity are especially important when using behavioral tasks, as even tasks that are able to detect robust effects at the group level (e.g., tests of implicit bias or self-regulation) often fail to produce reliable measurement of individual differences (Enkavi et al., 2019; Hedge et al., 2018; Schnabel et al., 2008). Fortunately, questionnaire measures of personality and tests of general or social cognitive ability tend to have better reliabilities than many of the measures commonly used in other areas of psychology (Hedge et al., 2018; Morrison et al., 2019; Pinkham et al., 2018; Vellante et al., 2013). Nonetheless, we can further increase our ability to reliably measure these constructs and estimate their associations with other variables by using latent variable methods, such as structural equation modeling (SEM), which models the prediction of latent variables by other latent variables.

Latent variables represent the shared variance of multiple measured (or manifest) variables (Schumacker & Lomax, 2004). For example, a latent social cognitive ability variable might be modeled as the shared variance of accuracy scores across different social cognitive ability tasks. Assessing variables of interest at the latent level allows for more robust conclusions, as latent variables capture only the shared variance of their indicators, thereby eliminating unsystematic error variance and more accurately capturing variability in the underlying constructs of interest (Keith, 2006). Single-task, performance-based indicators often are limited in their scope and measure constructs narrower than those they purport to represent (Apperly, 2012; Blain, Longenecker, et al., 2020b). Performance on any given task is influenced by a number of task-specific factors, but using multi-indicator designs and latent variable frameworks allows us to move toward measuring constructs more reliably as what is shared across multiple tasks, thereby avoiding underestimation of true effect sizes (Blain, Longenecker, et al., 2020b; Campbell & Fiske, 1959; Eisenberg et al., 2019; Enkavi et al., 2019; Nosek & Smyth, 2007). Modeling social cognitive ability as the shared variance in performance across tasks should give a better representation of true variance in social cognitive ability by factoring out unique task variance (which includes a combination of task-specific variance and error).

SEM also shows promise for analyzing brain function. Such analyses can be facilitated using atlases, such as the cortical parcellation created by Schaefer et al. (2018), where each parcel (a functionally homogeneous region of the cortex) is assigned to one of the large-scale functional networks identified by Yeo et al. (2011). This local-global parcellation scheme provides an ideal opportunity for the implementation of SEM, as the activation of a given neural network can be modeled as the shared variance of activation scores for its constituent parcels. In SEM, parcels with variance more representative of the overall network receive higher weighting in the computation of a latent variable representing overall network activation. These latent variables, representing brain activity in a given network, can then be examined as predictors of various behavioral variables, such as personality or task performance. In the current research, we leverage the advantages of SEM to investigate brain-behavior associations.

Current Study

Research on the default network and Agreeableness provides a promising avenue to improve understanding of individual differences in ToM ability. The current study used functional magnetic resonance imaging (fMRI) to investigate relations among these constructs. Specifically, we investigated whether individual differences in activity of the default network during a ToM task (Abell et al., 2000; Castelli et al., 2000) was related to individual differences in social cognitive ability and Agreeableness. We hypothesized that neural activity in the default network, and in particular its dorsal medial subsystem, would be greater when participants were engaged in ToM (social) versus nonsocial animations (Hypothesis 1). Furthermore, we hypothesized ToM-related activity in the default network would positively predict participants’ ToM ability as indicated by accuracy on the triangles task (Hypothesis 2a) and by the shared variance of performance on multiple social cognitive tasks (Hypothesis 2b). Finally, we expected that the same ToM-related activity in the default network would be positively associated with the personality trait of Agreeableness (Hypothesis 3).

Method

Data and materials for the current study are available on the Human Connectome Project’s website: https://www.humanconnectome.org/study/hcp-young-adult. Additionally, we have made scripts and model specifications available in an Open Science Framework repository: https://osf.io/tf5sh/?view_only=bbe63663daf6443493ab1b330bfd3f55.

Participants

The current study included 1,050 participants (564 females) from the Human Connectome Project (HCP; Van Essen et al., 2013) young adult sample. Specifically, our subsample was taken from the full 1,206 participant HCP sample and contained all participants with fMRI data for the ToM task. This sample included individuals between the ages of 22 and 37 years (M = 28.8, SD = 3.7). Exclusion criteria for the HCP included a history of severe psychiatric, neurological, or medical disorders; however, participants were not excluded on the basis of mild psychopathology (i.e., mental illness without active psychosis or mania, medication use, or treatment for a period longer than 1 year). Informed consent was obtained for all participants (consent procedure is further detailed in Van Essen et al., 2013), and all study protocols were approved by the Institutional Review Board of Washington University in St. Louis (IRB # 201204036; “Mapping the Human Connectome: Structure, Function, and Heritability”). Participants completed a large battery of self-report measures and behavioral tasks; however, only measures and tasks relevant to our current research questions (i.e., measures of the Big Five and social cognitive ability) are discussed in this paper.

Measures

NEO Five-Factor Inventory (FFI)

The NEO-FFI is a measure of the Big Five personality traits: Conscientiousness, Agreeableness, Neuroticism, Openness to Experience, and Extraversion. It consists of 60 items taken from the longer NEO Personality Inventory, Revised (NEO PI-R; Costa & McCrae, 1992) and uses a five-point Likert scale ranging from 0 (“Strongly Disagree”) to 4 (“Strongly Agree”). Examples of Agreeableness items included “I generally try to be thoughtful and considerate,” “Most people I know like me,” and “If I don't like people, I let them know it (reversed).” The other Big Five scales (Conscientiousness, Neuroticism, Openness, and Extraversion) were used for tests of discriminant validity.

Social cognition tasks

Based on examinations of the data available in the HCP and comparisons to the existing literature, we originally identified five behavioral tasks as relevant to social cognition: a triangles ToM task, a facial emotion recognition task, a face memory condition from a working memory task, an emotional face matching task, and a moral-of-the-story identification task. After examining accuracy scores from all five of these tasks, we eventually came to focus on the first three tests due to substantial ceiling effects for the face matching and stories tasks, with most participants receiving perfect or nearly perfect scores.

Tricky Triangles Task

While in a 3T fMRI scanner, participants were presented with a series of computerized animations of shapes interacting in either a random or social way (Castelli et al., 2000; Wheatley et al., 2007). Originally designed to assess ToM abilities in autism spectrum disorders (Abell et al., 2000), the task required participants to indicate whether each animation was random or social in nature after viewing each 20-second video clip. In the random condition, the shapes did not interact meaningfully with each other but rather moved around purposelessly. In the social condition, the shapes moved in ways that mimicked human behavior, including a variety of interaction types demonstrating particular social sequences such as coaxing, seducing, or mocking. Participants completed a total of 10 task blocks (2 social and 3 random condition video blocks in the first run; 3 social and 2 random condition videos in the second run). Each task block was separated by a 15-second block in which participants observed a fixation cross (with 5 fixation blocks per run). Example stimuli are shown in Figure 2. Participants were asked to identify whether each video clip was random or social in nature and performance was scored for accuracy (i.e., whether participants correctly classified animations as random or social). Scores on the ToM triangles were negatively skewed, and a log transformation (in which scores were reversed before and after transformation to maintain scoring direction) was used to increase their normality.

Fig. 2.
figure 2

Social cognition tasks. Note. The above figure shows images taken from the three social cognition tasks used in the study.

Penn Emotion Recognition Task (ER40)

The emotion recognition task was adapted from Gur et al. (2001). In this task, participants were presented with a series of 40 faces and were asked to identify what emotion each face expressed. Emotion options included “Happy,” “Sad,” “Angry,” “Scared,” and “No Feeling.” Eight faces were presented for each emotion, half of which were male and the other half female. See Figure 2 for example stimuli and answer choices. Participants’ accuracy and reaction times were recorded.

Two-back Task

In the two-back task, participants were presented with a series of stimuli from four categories: body parts, faces, places, and tools (Barch et al., 2013). See Figure 2 for example stimuli. In each of the conditions, participants were shown a series of objects and tasked to indicate by pressing a button whenever an object (i.e., face, body part, place, or tool) was presented that had been presented two trials previously. Each block consisted only of one stimulus type. Participants completed a total of 16 blocks (two runs of the two-back for each of the four stimulus types). Each block consisted of 10 trials, lasting 2.5 seconds each. Only the face condition was included in assessing social cognitive ability due to its social relevance compared to the other conditions. Participants were scored for accuracy, separately for each of the object conditions.

Task fMRI Data Acquisition and Processing

Data were obtained that had undergone preprocessing and preliminary analysis by researchers at the HCP. Specifically, we used results of the HCP’s level-two, individual-subject, cortical-vertex-based analyses based on fMRI data acquired while participants completed random and social conditions of the tricky triangles task described above (Abell et al., 2000; Castelli et al., 2000). Specifics of the fMRI data acquisition are detailed in previous publications about the HCP (Uğurbil et al., 2013). In summary, whole-brain echo planar imaging acquisitions were acquired with a 32-channel head coil on a modified 3T Siemens Skyra used for all HCP data collection at Washington University in St. Louis (TR = 720 ms, TE = 33.1 ms, flip angle = 52 deg, BW = 2,290 Hz/Px, in-plane FOV = 208 × 180 mm, 72 axial slices, 2.0-mm isotropic voxels, with a multiband acceleration factor of 8). One run of the ToM task used right-to-left phase encoding and the other utilized a left-to-right phase encoding.

Data analysis pipelines for the HCP were primarily built using tools adapted from FreeSurfer and FSL. The first step in processing included application of the HCP “fMRIVolume” pipeline. This process generates “minimally preprocessed” 4D time series data for each run and participant, and the pipeline steps include gradient unwarping, FLIRT-based motion correction, TOPUP-based field map preprocessing using a spin echo field map, brain-boundary-based registration of EPI to structural T1-weighted scan, nonlinear (FNIRT) registration into MNI152 space, and grand-mean intensity normalization. The data were then transformed into grayordinate space, which allows for more efficient analysis of brain activation levels for components of the cortical surface. In this process, data from the cortical gray matter ribbon are projected onto the surface and then onto registered surface meshes with a standard number of vertices (in this case, approximately 30,000). Smoothing of the left and right hemisphere time series and autocorrelation estimates (from FILM) were done on the surface using a geodesic Gaussian algorithm.

Activity estimates were computed for the preprocessed functional time series from each run using a general linear model (GLM) implemented in FSL’s FILM (FMRIB’s Improved Linear Model) with autocorrelation correction. Predictors were convolved with a double gamma “canonical” hemodynamic response function to generate the main model regressors. To facilitate analyses of individual differences in response to given stimuli, GLM predictors were based on the category of each video clip rather than the rating of the individual (i.e., conditions were based on the appropriate response rather than each participants’ actual response, and accuracy was not considered in these GLMs). Each predictor covered the duration of a single video clip (20 s) and did not include time during fixation cross-viewing. To compensate for slice-timing differences and variability in the HRF delay across regions, temporal derivative terms derived from each predictor were added to each GLM and were treated as confounds of no interest. Subsequently, both the 4D time series and the GLM design were temporally filtered with a Gaussian-weighted linear high-pass filter with a (soft) cutoff of 200 s.

Fixed-effects analyses were conducted using FEAT to estimate the average effects across runs within-participants. Cross-run statistical comparisons occurred in standard grayordinates space rather than volume space. As in the individual analysis, NIFTI-1 matrices were processed separately for left and right surface and subcortical volume data, and surface outputs were converted to GIFTI at the conclusion of analysis. Participant-level z-statistic maps (computed as z-transformed t-statistics) were combined from left and right hemisphere cortical and subcortical gray matter into the recently introduced CIFTI data format, with individual z-statistics for each condition output for each cortical vertex.

Group Prior Individualized Parcellation

Network activation was identified using group prior individualized parcellation (GPIP), an approach that begins with a standard atlas of parcels for all participants but adjusts the boundaries of each parcel for each individual to correspond to their unique cortical organization. This is an effective solution to the problem that cortical functional organization is not identically related to anatomical landmarks in each person. For each participant, fMRI BOLD time-series acquired during the tricky triangles task in subject native space were resampled to the fsaverage5 cortical surface mesh (Dale et al., 1999) and normalized at each vertex. The resulting subject surface data were mapped onto a pre-defined group atlas with 400 functionally distinct regions (Schaefer et al., 2018) that align well with the 17-network atlas defined by Yeo et al. (2011). An iterative algorithm utilizing two Bayesian priors was applied to model connectivity between parcels and adjust parcel boundaries (Chong et al., 2017). Through this process, parcel boundaries were modified to reflect each participant’s unique patterns of functional connectivity. This method permits a more accurate approximation of individuals’ unique functional topography during the social cognition task while maintaining correspondence of parcels across all participants and with the atlas. Previous research evaluating GPIP has demonstrated that individualized parcels exhibit greater network coherence and better segregation of task activation compared to the parcel locations from the initial group atlas (Chong et al., 2017), and a growing body of research has reported robust associations of the parameters of individualized parcels with a variety of measures of individual differences (Anderson et al., 2021; Kong et al., 2019; Mwilambwe-Tshilobo et al., 2019).

For each participant, individualized parcels were resampled to grayordinate space to permit comparisons between parcel assignment and task activation values for each vertex. Following output of z-statistics for each cortical vertex for the social and random conditions for the ToM task (the processed data obtained from HCP’s database) and generation of individualized parcellation mappings using GPIP, we mapped the vertex-wise individual participant data onto each participants’ individualized parcels. We then computed parcel activation variables for each condition, for each cortical parcel associated with the default network, by averaging the z-statistics for vertices associated with each cortical parcel. Our parcellation activation variables were sorted by default network subsystem and included 34 parcels associated with the core subsystem, 32 parcels associated with the dorsal medial subsystem, and 13 parcels associated with the medial temporal subsystem. We then reduced the number of variables for each of these subsystems, creating composite activation variables for cortical parcels that were anatomically adjacent. This left us with a total of 9 parcels (per condition) for the core subsystem, 9 for the dorsal medial subsystem, and 6 for the medial temporal subsystem. These parcels were used for subsequent analyses.

Statistical Analysis

Structural equation models (SEMs) were used to examine whether variation in brain activation during the social vs. random condition, for each subnetwork of the default network, predicted social cognitive ability and Agreeableness. Separate social- and random-activation latent variables were derived for each of the three subsystems, using all the corresponding cortical parcels for each subsystem as indicators. The latent variables produced by this procedure represent the shared variance among their indicators and thus can be interpreted as reflecting variation in the tendency toward activation, in a given condition, for each subnetwork as a whole. The core subsystem latent variables had a total of nine indicators: right temporal, right IPL, right PCC/Precuneus, right dPFC, right mPFC, left IPL, left PCC/Precuneus, left dPFC, and left mPFC. The dorsal medial subsystem latent variables had a total of nine indicators: right temporal, right anterior-temporal, right dPFC, right vPFC, left temporal, left IPL, left dPFC, left vPFC, and left lPFC. The medial temporal subsystem latent variables had a total of six indicators: right IPL, right parahippocampal cortex, right retrosplenial cortex, left IPL, left parahippocampal cortex, and left retrosplenial cortex.

Using the same approach to parcellation, we also created latent variables representing activation in the frontoparietal control network (FPCN), as identified by Yeo et al. (2011), to test for discriminant validity. The FPCN makes a good contrast to the default network because it also is involved in complex cognitive processes, such as working memory and intelligence (Santarnecchi et al., 2017), but has not been strongly linked to social cognition. These frontoparietal variables were indicated by parcels located in the right PCC, right PFC, right temporal, right parietal, left PCC, left PFC, left orbitofrontal cortex, left temporal, and left parietal activation. (In Schaefer et al.’s parcellation scheme, each parcel is assigned to only one network of Yeo et al., so there was no overlap between indicators for FPCN and default network.)

In all of our models, residuals from anatomically identical manifest variables were allowed to correlate (e.g., the random and social manifest variables for right PFC activation in the dorsal medial models). Maximum likelihood estimation was used and common fit indices were computed, including the chi square, Tucker Lewis index, and root mean squared error of approximation (RMSEA). The Latent Variable Analysis (LAVAAN) package for R was used for estimating all models (Rosseel, 2012).

Based on these neural activation measurement models, we first tested for latent mean differences in activation for each of the latent variables representing the default network’s three primary subsystems during the social vs. random animation conditions. Subsequently, additional SEMs were used to assess the relations among latent variables representing Agreeableness, social cognitive ability, and neural activation in the three default network subsystems and the FPCN, during the social and random animation conditions. More specifically, SEMs were conducted to test the effects of default network activation during the social animation condition on 1) accuracy on the triangles task, 2) latent social cognitive ability, representing accuracy on a variety of social cognitive tasks, and 3) Agreeableness. Separate models were computed for the core, dorsal medial, and medial temporal subsystems (and for subnetworks of the FPCN) because of a high degree of multicollinearity among the neural variables, leading to failures of model convergence if all networks were included at once. Post-hoc analyses were conducted separately for temporal and prefrontal components of the default network’s dorsal medial subnetwork, given the particular relevance of these regions to social cognition and related personality traits and the fact that this network was shown to be significantly more active during social vs. random animations. Intelligence, sex (coded as 0 = female, 1 = male), age, and neural activation during the random condition (in the appropriate subnetwork) were included as covariates in all models.

In our analyses, including activation in the random condition as a covariate replaces using a contrast of the two conditions. Despite the ubiquity of contrast scores (differences in activation between two conditions) as variables of interest in neuroimaging research, this approach suffers from many of the problems that have been noted regarding the use of difference scores instead of including both conditions of interest in analyses (Allison, 1990; Edwards, 1994; Edwards, 1996; Wittenborn, 1951). When using a difference score, as is the case in a traditional fMRI contrast, variation in the effect of interest can either be due to the control condition (e.g., random animations) or the condition of interest (e.g., social animations). Difference scores do not capture any information about the association between scores on the two conditions of interest, instead imposing a linear restriction on their slopes when predicting outcome variables of interest (Allison, 1990; Edwards, 1994; Edwards, 1996; Wittenborn, 1951). Thus, if we were to use difference scores, we would not be able to identify the specific influence of social activation (vs. random activation) on our behavioral variables of interest. Including activation for both conditions as predictors allows us to partial out variance in activation that is shared between the conditions, such that we see the effect of our condition of interest after controlling for the baseline provided by the random condition. The effect of interest therefore indicates how much each subject’s activation deviates in the condition of interest from the activation that would be expected based on the control condition. Thus, we can accurately estimate the unique associations of activation during the social condition (as well as activation during the random condition) with our behavioral variables of interest.

Latent social cognitive ability was modeled using accuracy variables for the triangles task (correct vs. incorrect responses for the random and social animations), ER40, and the face memory condition from the two-back task. In addition to this latent social cognitive ability variable, we also conducted a test using accuracy on only the ToM triangles task as manifest criterion variable. This manifest outcome variable test was included because our neural variables measured activation specifically during this task. For all relevant models, intelligence was modeled using tests of Picture Vocabulary, Matrix Reasoning, and English Reading, as well as a hierarchical working memory factor using the four two-back task conditions, onto which the face memory variable was allowed to cross load.

Results

Descriptive statistics for self-report measures and task performance are presented in Table 1. Pearson correlations among all variables are presented in Tables S1, S2, and S3. Weak positive zero-order correlations were observed between social cognition accuracy measures and NEO Agreeableness. Social cognition measures were also positively correlated with intelligence measures, with stronger magnitudes.

Table 1. Descriptive statistics for self-report and task measures

Hypothesis 1

SEM was used to test for mean differences in default network activation for the social vs. random conditions of the triangles task. Fit statistics for all SEMs are presented in Table 2. Results are visualized in Figure 3. For these models, our dependent variables are calculated as weighted averages of activation across the regions of a given subnetwork and can be interpreted as reflecting activation (for each given condition) in that subnetwork as a whole. Activation of the default network’s dorsal medial subsystem was significantly greater during the social condition vs. the random condition (Figure 3; z = 5.70, p < 0.001); this pattern of activation held both for regions centered on the temporoparietal junction and temporal pole (Figure 3; z = 6.26, p < 0.001) and for regions in the prefrontal cortex (Figure 3; z = 14.37, p < 0.001). Activation was significantly less for the social condition vs. the random condition in the medial temporal subsystem (Figure 3; z = −5.68, p < 0.001) and core subsystem (Figure 3; z = −9.42, p = 0.089).

Table 2. Fit statistics for all structural equation models
Fig. 3.
figure 3

Social vs. random brain activation during the triangles task. Note. DN = default network. Figure shows standardized latent means for activation in DN subsystems and the FPCN for the social vs. random conditions of the triangles task. Our dependent variables were calculated (via structural equation modeling) as weighted averages of activation across the regions of a given subnetwork and can be interpreted as reflecting activation in that subnetwork as a whole (during each condition).

Hypotheses 2a, 2b, and 3

SEM also was used to test for associations of default network activation during the social condition of the ToM task with social cognitive ability and Agreeableness (Table 3). Activation of the dorsal medial, core, and medial temporal subsystems of the default network were positively associated with accuracy on the triangles task, controlling for age, sex, intelligence, and neural activation in the random condition (Table 3). Similarly, activation of all three default network subsystems was positively associated with shared variance in accuracy across various tests of social cognitive ability.Footnote 1 Across models, intelligence was a significant positive predictor of social cognitive ability. Furthermore, sex was a significant predictor of social cognitive ability, with females performing better on social cognition tasks on average. A full measurement and structural model is presented for the prefrontal component of the dorsal medial subsystem in Figure 4.

Table 3. Results for default network activation models
Fig. 4.
figure 4

Relation between dorsal medial prefrontal activation and social cognitive ability. Note. RH = right hemisphere, LH = left hemisphere, PFC = prefrontal cortex, v = ventral, d = dorsal, l = lateral, SCog Acc = accuracy on the social cognition tasks. Structural equation modeling was used to test the association of dorsal medial default network activation during the social condition of the theory of mind task with social cognitive ability. Age, sex, intelligence, and neural activation in the random condition were included as covariates. Activation of the default network’s dorsal medial subsystem (modeled as the shared variance among activation scores in spatially contiguous regions) was positively associated with shared variance in accuracy across various tests of social cognitive ability.

Activation during the social condition significantly predicted Agreeableness for the medial temporal subsystem and for the prefrontal cortex component of the dorsal medial subsystem (Table 3). As with our social cognitive ability variable, sex was a significant predictor of Agreeableness; females had higher levels of Agreeableness. A full measurement and structural model is presented for the prefrontal cortex component of the dorsal medial subsystem in Figure 5.

Fig. 5.
figure 5

Relation between dorsal medial prefrontal activation and Agreeableness. Note. RH = right hemisphere, LH = left hemisphere, PFC = prefrontal cortex, v = ventral, d = dorsal, l = lateral, NEO A = Agreeableness. Structural equation modeling was used to test the association of dorsal medial default network activation during the social condition of the theory of mind task with Agreeableness. Age, sex, and neural activation in the random condition were included as covariates. Activation of the default network’s dorsal medial subsystem (modeled as the shared variance among activation scores in spatially contiguous regions) was positively associated with shared variance in Agreeableness items from the NEO Five Factor Inventory.

A final set of analyses was conducted to examine discriminant validity. We found that activation in FPCN subnetworks during the social animations condition also significantly predicted accuracy on the triangles task and latent social cognitive ability, but not Agreeableness. Effects were generally, but not always, weaker compared to effects observed in the default network models (Table 4). Activation in the default network during the social condition blocks did not predict personality traits other than Agreeableness (i.e., Conscientiousness, Neuroticism, Extraversion, or Openness). The full results of these analyses are displayed in Table 4.

Table 4. Results for discriminant validity models

Discussion

The current study used a large fMRI sample, multiple behavioral tasks, and SEM to investigate how ToM-related activity in the default network and its subsystems was related to social cognitive ability and Agreeableness. Findings largely confirmed our three main hypotheses. Neural activity in the dorsal medial subsystem of the default network was significantly greater during the viewing of social animations compared to random animations (Hypothesis 1). Activity in the dorsal medial subsystem—while participants viewed the social animations—positively predicted performance on the triangles ToM task (Hypothesis 2a). This was true with or without controlling for covariates, such as intelligence, suggesting that the association is robust. This positive association also was found for the default network’s core and medial temporal subsystems, as well as components of the frontoparietal control network. Likewise, neural activity in these regions, during the social condition of the task, positively predicted social cognitive ability, modeled using a latent variable indicated by accuracy scores on three different social cognition tasks (Hypothesis 2b). Finally, neural activity during the social animations positively predicted individual differences in the personality trait Agreeableness, for prefrontal regions of the dorsal medial subsystem and for the medial temporal subsystem (Hypothesis 3). Associations with Agreeableness were not seen for another neural network involved in complex cognition (the FPCN), nor were other personality traits associated with default network activation, suggesting specificity for the associations among default network activation and Agreeableness.

Collectively, our results suggest that individual differences in both Agreeableness and ToM are related to variation in the same underlying neural network. Our findings reinforce previous research tying the default network—and more specifically its dorsal medial subsystem—to ToM and the ability to understand the mental states and emotions of others (Allen et al., 2017; Amodio & Frith, 2006; Carrington & Bailey, 2009; Frith & Frith, 2006; Sabbagh et al., 2004; Saxe & Kanwisher, 2003; Saxe & Powell, 2006; Saxe & Wexler, 2005; Schurz et al., 2014; Spreng & Andrews-Hanna, 2015; Spunt & Lieberman, 2012; Vogeley et al., 2001; Young et al., 2010). The present study extends previous findings by demonstrating that activity in the dorsal medial subsystem not only predicted performance on a single task, but also performance on a variety of social cognitive tasks modeled as a latent variable, thereby providing evidence for a positive association between individual differences in broad social cognitive ability and default network function. This relationship could provide insight into how both cognitive and neural variation contribute to individual differences in social functioning.

Our study demonstrated that ToM-related brain activity in prefrontal regions of the dorsal medial subsystem positively predicted individual differences in Agreeableness, a personality trait linked to social cognition and especially relevant for understanding social interactions. Previous research evaluating the association between personality and social functioning has linked both Extraversion and Agreeableness with interpersonal tendencies and trait affiliation (Côté & Moskowitz, 1998; DeYoung et al., 2013; DeYoung & Weisberg, 2018). Each of the Big Five traits can be thought of as relating to particular motivational, cognitive, and affective mechanisms (DeYoung, 2015; DeYoung & Blain, 2020). For example, pattern detection and curiosity for Openness-Intellect(Bainbridge et al., 2019; Blain, Longenecker, et al., 2020b; DeYoung et al., 2012; Silvia & Christensen, 2020) and reward sensitivity for Extraversion (Blain, Sassenberg, et al., 2020c; Lucas et al., 2000; Smillie et al., 2012). Agreeableness appears to reflect tendencies related to navigating social norms and coordinating with the needs of others (DeYoung, 2015; DeYoung & Weisberg, 2018; Koole et al., 2001). Agreeableness in particular has been associated with prosociality (Caprara et al., 2010; Habashi et al., 2016), higher levels of satisfaction in relationships (Malouff et al., 2010; Weidmann et al., 2017), and less prejudicial behavior towards others (Sibley & Duckitt, 2008). Although Agreeableness has been positively associated with many desirable social outcomes, its underlying mechanisms remain understudied among the Big Five personality traits, with few studies having investigated its neurocognitive correlates (DeYoung & Blain, 2020). Social cognitive ability and default network function, however, appear to be promising candidates for understanding the substrates of individual differences in Agreeableness (Allen et al., 2017; Arbula et al., 2021).

Synthesizing Current Findings and Previous Work

Our findings were consistent with previous work, demonstrating positive associations of default network function with social cognitive ability and Agreeableness. For instance, previous studies with high statistical power have demonstrated positive associations of resting state functional connectivity within the default network and ToM ability and questionnaire measures of trait empathy and compassion (Allen et al., 2017; Takeuchi et al., 2014). The current study extends this work to look at brain activity during a ToM task rather than just during rest, and suggests that ToM-related brain activity in the dorsal medial subsystem of the default network may be associated with both Agreeableness and ToM abilities.

Together with the previous work, our findings suggest a possible explanation for why people high in Agreeableness tend to demonstrate better interpersonal outcomes than less agreeable people; highly agreeable people may have better social abilities because of differences in the function of specific brain networks, including the default network and particularly its dorsal medial subsystem. Our study also accounts for possible alternative explanations by including covariates, such as intelligence, sex, and age. Although some covariates also were related to our variables of interest, nonetheless controlling for them did not eliminate the hypothesized effects.

In light of the current results, which suggest significant associations of social cognitive ability and Agreeableness with default network activation, it is worth mentioning contrasting findings from a recent study utilizing the same HCP dataset (Weiss et al., 2021). Weiss et al. found no meaningful relations between personality variables and neural activity during the same triangles task data we analyzed in the current study. The authors attribute the lack of significant associations to methodological issues such as the questionable validity of the social cognition task and test–retest reliability of functional biomarkers. Our utilization of better methods, such as the individualized parcellation approach of GPIP and latent variable modeling for behavioral and neural variables, can increase reliability and thus the ability to detect true associations among variables (Blain, Longenecker, et al., 2020b; Campbell & Fiske, 1959; Chong et al., 2017; Eisenberg et al., 2019; Enkavi et al., 2019; Keith, 2006; Kong et al., 2021; Nosek & Smyth, 2007). This increased reliability of our variables of interest is a likely explanation for why we were able to detect significant associations among social cognition, personality, and default network activation in the current work, in contrast to the null effects observed by Weiss et al. (2021).

In the current study, although all default network subsystems showed robust relations to individual differences in social cognitive ability, only the dorsal medial subsystem was significantly more active during the social condition of the ToM task compared with the random condition, and only activity in the medial temporal subsystem and prefrontal regions of the dorsal medial subsystem significantly predicted individual differences in Agreeableness. Though we should avoid overinterpreting this specificity of the dorsal medial subsystem and its prefrontal regions, as effect sizes were fairly similar in magnitude across the subsystems, the current pattern of results is in line with research suggesting the dorsal medial subsystem may be more strongly linked to social cognition than the other two default network subsystems. The dorsal medial subsystem also appears to have broader functions in language processing, which can be argued to be inherently social (Spreng & Andrews-Hanna, 2015).

The core subsystem includes regions of the brain associated with social cognitive functions (Leech & Sharp, 2014; Spreng et al., 2009; Spreng & Andrews-Hanna, 2015) but also other cognitive functions that are not as specifically social in nature, such as the retrieval of autobiographical memory and personal knowledge (Moran et al., 2013; Spreng et al., 2009; Spreng & Andrews-Hanna, 2015). Similarly, the medial temporal subsystem is involved in more non-social cognitive functions, again including episodic memory (Buckner et al., 2008; Spreng & Andrews-Hanna, 2015). Future work should more specifically investigate the coordination of these three subsystems, as the joint activation of these subsystems and functional connectivity between the systems appears to be particularly relevant to social cognition and corresponding individual differences (Allen et al., 2017; Spreng & Andrews-Hanna, 2015; Takeuchi et al., 2014).

Role of General Cognitive Ability and Sex Differences

Social cognitive ability was significantly predicted by general cognitive ability and sex, as well as by Agreeableness and brain activity. The association between social cognitive ability and general cognitive ability is not surprising, because utilizing social cognitive processes likely also engages other general cognitive processes, such as working memory (Phillips et al., 2008; Spreng, 2013; Thornton & Conway, 2013), attentional processes (Holmes et al., 2003; Leslie et al., 2004; Schultebraucks et al., 2016), and nonverbal communication skills (Morrison et al., 2019). Indeed, previous research suggests strong positive correlations between general and social cognitive abilities (Allen et al., 2017; Landy, 2005; Thorndike & Stein, 1937).

What is perhaps more interesting is the potential role general intelligence might play in the association between sex and social cognitive ability, as well as how our findings might provide some explanation for why previous research has shown mixed results for sex differences in social cognitive ability (Di Tella et al., 2020). Without controlling for age or intelligence, associations between sex and the indicators for our latent social cognitive ability variable suggested little sex difference. Once general intelligence and age were introduced as covariates; however, a significant negative association appeared between sex and our latent social cognitive ability variable, indicating that females displayed higher social cognitive ability than males. Further, zero-order correlations between sex and all of the indicators of our latent IQ variable (i.e., Picture Vocabulary, English Reading, Matrix Reasoning, and all four conditions of the two-back task) show that males significantly outperformed females in general cognitive ability in our sample. Thus it would make sense that females’ greater ability in social cognitive tasks specifically might be suppressed when not controlling for general cognitive ability. Considering this possibility, and the fact that few studies looking at sex differences in social cognitive ability have controlled for general intelligence (Navarra-Ventura et al., 2018), our study suggests that mixed results in previous research may stem from confounding sex differences in general cognitive ability with those specific to social cognitive ability. Our findings are consistent with the wealth of literature suggesting that females empathize with others more (Hoffman, 1977; Mestre et al., 2009) and are more accurate at interpreting the emotional states of others (Montagne et al., 2005; Nettle, 2007; Stiller & Dunbar, 2007; Wingenbach et al., 2018). This is in line with research that indicated that females are higher in Agreeableness (Costa Jr et al., 2001; Weisberg et al., 2011).

Relevance to Psychopathology

Findings from the current study could potentially be extended in future research to benefit understanding of various forms of psychopathology. As previously mentioned, poor ToM performance has been associated with a variety of psychopathology dimensions and disorders such as schizophrenia (Abram et al., 2016; Pedersen et al., 2012), schizotypy (Blain et al., 2017; Bora, 2020), autism (Baron-Cohen et al., 1985; Brune & Brune-Cohrs, 2006), autistic traits (Best et al., 2008; Blain et al., 2017), and Williams syndrome (Tager-Flusberg & Sullivan, 2000). Likewise, low Agreeableness (i.e., Antagonism) has been associated with a host of personality disorders and negative real-life social outcomes (Anderson et al., 2018; Krueger et al., 2012).

Our results might serve to inform research done in these clinical populations and are consistent with recent dimensional and transdiagnostic frameworks for understanding psychopathology, such as the National Institute of Mental Health’s Research Domain Criteria (RDoC; Insel et al., 2010) and the Hierarchical Taxonomy of Psychopathology (HiTOP; Kotov et al., 2017). These frameworks seek to understand psychopathology in terms of underlying dimensions rather than diagnostic categories. Agreeableness and ToM, especially when considering dysfunctionally low levels of functioning, are two such promising dimensions that could be useful in clinical research and practice. Future intervention research could explore ToM deficits and low Agreeableness as transdiagnostic targets for intervention. Likewise, as the default network and its dorsal medial subsystem appear to be implicated in ToM and Agreeableness, neurostimulation research could explore whether electrical or magnetic stimulation of brain regions such as the TPJ and dmPFC might lead to changes in social cognitive ability and relevant social outcomes (Johnson et al., 2013).

Methodological Considerations

Compared with much of the previous work done on the topic, the current study uses a large dataset conferring relatively high statistical power. Although a number of existing studies have investigated possible associations between individual differences in default network function and individual differences in social cognitive ability and related traits, the majority of these studies have used sample sizes in the range of 10 to 70 individuals (Hughes et al., 2019; Inagaki & Meyer, 2020; Kaplan & Iacoboni, 2006; Song et al., 2009; Tamir et al., 2016; Wagner et al., 2011; Waytz et al., 2012; Zhang et al., 2019), which are not optimal for detecting reliable estimates of between-subjects effects (Button et al., 2013). Considering this, the design of the present study should yield more reliable findings and contribute to the robustness of the field (while still keeping in mind the limitations of our study detailed below).

The approach employed in this study to measure brain function could also benefit future research on the neurobiology of individual differences. We used a network-based approach by incorporating atlases based on patterns of functional connectivity in large samples (Schaefer et al., 2018; Yeo et al., 2011). Each participants’ data were individually mapped onto a 400-parcel atlas that aligned within 17 broader functional networks identified by Yeo et al. (2011), using GPIP to ensure that parcels were adjusted to the optimal location for each participant. Each of the networks described in Yeo et al. (2011) also summarize regions of the brain that tend to be synchronously active in patterns that can be consistently identified across samples. Understanding how specific cortical parcels map onto these broad networks can be used to unify and better understand previous findings for individual regions of interest from the social neuroscience literature (Tompson et al., 2018).

Our approach should be more effective for studying individual differences that will generalize across samples, compared with the typical use of contrasts for identifying brain regions on a voxel-wise or cluster-wise basis. By focusing on well-established, large-scale brain networks, the identified regions of interest represent a broad set of brain structures with a priori relevance for a given construct of interest (in this case, social cognition) rather than specific voxels or clusters that might be most strongly associated with that construct only by chance in any given sample (Vul et al., 2009; Yarkoni, 2009). Moreover, approaches that focus on broad networks may be more reflective of how the brain typically functions, relative to a more localized or modular approach. Brain-behavior associations appear to be more extensive than once believed, in contrast to the relatively small clusters or regions of the brain that are often reported in underpowered samples (Yarkoni et al., 2010). A majority of brain regions are involved in multiple psychological processes, and many psychological processes involve multiple different regions of the brain, not just in the case of social neuroscience (Poldrack, 2010; Yeo et al., 2011). A network-based approach allows researchers to capture a wider picture of brain function and its relation to behavioral constructs of interest; this approach, in conjunction with a large sample size, should lend itself to reproducibility and generalizability (Yarkoni, 2009).

Limitations

Despite the multiple strengths of the current study, there are some important limitations. First, although one advantage of the current study was using multiple tasks in defining our social cognition accuracy variable, we still only utilized neuroimaging data from a single ToM task in computing our neural activation variables. Future research could use an SEM approach to model how variance in brain activity during a variety of different ToM tasks completed in the scanner might predict social cognitive ability and personality. Moreover, although the current data provide evidence that individual differences in default network function are associated with social cognitive abilities and related personality traits, the causal direction and dynamics of these associations cannot be established with the current study design. Future work incorporating methods such as neuromodulation, dynamic causal modeling, experience sampling, and long-term longitudinal data collection could help to establish more clearly the causal pathways involved in the neurobiology of personality and individual differences.

Finally, even though we found a significant correlation between ToM-related activation in the dorsal medial subsystem and Agreeableness, this correlation is likely attenuated in its effect size by the personality measure used. Although the NEO-FFI is a reasonably effective brief measure for evaluating the Big Five personality traits, it was not designed to allow for the assessment of personality at lower levels of the hierarchy, including personality aspects and facets. Given possible differential associations of subdimensions within Agreeableness with social cognition and default network function (Allen et al., 2017), a measure that can distinguish between subdimensions of Agreeableness would be optimal. Future research examining the relation between these variables should include personality measures that can assess personality at multiple levels of the trait hierarchy to better discern which specific dimensions contribute to brain-behavior associations.

Conclusions

Our findings in a very large neuroimaging sample confirm and extend the current literature linking ToM, the default network, and Agreeableness. Given that ToM-related activation in prefrontal regions of the dorsal medial subsystem positively predicted both latent levels of Agreeableness and social cognitive ability, it appears that the functions of the default network may help account for the link between Agreeableness and ToM. These findings may inform future research that seeks to understand how normal functioning goes awry in psychopathology involving social deficits, and how individual differences in social cognition and related traits affect real-world relationship success, social network quality, and interpersonal functioning. In sum, the current research furthers work on the neural and personality correlates of individual differences in social cognition while demonstrating effective methods in social cognitive neuroscience research. We recommend that researchers consider using individualized parcellation methods, network-based hypotheses, and latent variable techniques such as SEM, rather than voxel-wise analyses, when designing future studies assessing individual differences in brain data.