Introduction

Natural selection led humans to punish norm violators. This willingness enabled us to establish and enforce social norms in circumstances under which cheating might otherwise have existentially unraveled our intense social groups (Fehr & Gächter, 2002; Hoffman, 2014). Punishing offenders takes time and effort, and runs the risk of retaliation. People are nevertheless willing to bear costs to implement punishment (i.e., costly punishment) (Fehr & Fischbacher, 2004a; Fehr & Fischbacher, 2004b; Fehr & Gächter, 2002; Henrich et al., 2006). It is of great interest to many researchers that people might punish even when they merely observe the norm violation as an impartial third-party—a typical human prosocial behavior, which is commonly referred to as third-party punishment (TPP) (Fehr & Fischbacher, 2004a; Fehr & Fischbacher, 2004b; Jordan et al., 2016; Leibbrandt & López-Pérez, 2012; Strobel et al., 2011).

Costly punishment is characterized by a great deal individual variability, with some people exhibiting more willingness than others to sacrifice personal resources to punish norm violators (Feng et al., 2018; Mothes,, & Strobel, 2016; Nash,, & Knoch, 2014). Such heterogeneity in the propensity for costly punishment has been documented to be related to variances in personality traits (Paz et al., 2017; Polgár, Fogd, Unoka, Sirály,, & Csukly, 2014; Sun et al., 2015; Wang et al., 2014), genetic polymorphisms (Gartner, Strobel, Reif, Lesch,, & Enge, 2018; Zhong et al., 2010), hormone levels (Burnham, 2007), as well as brain structures (Baumgartner et al., 2013; Baumgartner et al., 2015). However, whether individual differences in the propensity for costly punishment, particularly in TPP, can be predicted by a large-scale intrinsic, functional network architecture is still uninvestigated.

TPP could be measured either by economic games or hypothetical criminal scenarios. In the game-based approach, participants act in the role of observers. The amount of money participants forgo to punish a selfish player is considered as their subjective willingness to TPP (Fehr & Fischbacher, 2004b; Kahneman,, & Thaler, 1986). In the scenario-based approach, on the contrary, participants act as a jury or a judge, in which TPP takes the form of a hypothetical sentencing decision or scaled punishment (Krueger & Hoffman, 2016). In either approach, whether and how much people decide to punish a selfish person for retributive purposes as a third-party depends on how much they blame the offender. Decisions on someone’s blameworthiness are made by weighting two factors: the outcome and the violator’s intention for the norm violation (Crockett, Özdemir, & Fehr, 2014; Hoffman, 2014; Krueger & Hoffman, 2016).

The insights of social, cognitive, and affective neuroscience have helped us gain a deeper understanding of how we punish by integrating the intention and outcome of an offense. A plethora of task-based functional neuroimaging studies have underscored the importance of domain-general large-scale brain networks in understanding the neural underpinnings of social cognition and interaction (Bressler & Menon, 2010; Chiong et al., 2013; Menon, 2011; Seeley et al., 2007; Sevinc, Gurvit, & Spreng, 2017; Sheffield et al., 2015). According to a recent neuropsychological framework of costly punishment (Krueger & Hoffman, 2016), the cingulo-opercular network (CON) (often labeled also as the salience network) signals a norm violation and generates an emotional response to the adverse outcomes of the violation; the default mode network (DMN) signals the intention behind the norm violation and then integrates that signal with the harm signal from the CON into an assessment of blame; and the frontoparietal network (FPN) (often labeled also as the central executive network) eventually converts the blame signal into a punishment decision. Those three networks have been widely proved to be involved in TPP (Bellucci et al., 2017; Civai and Sanfey, 2019; Dosenbach et al., 2007; Glass et al., 2016; Krueger et al., 2014; Zhong et al., 2016).

Much of what we know about the neuropsychological mechanisms of costly punishment is based on task-based fMRI studies that correlate experimental conditions or behavioral performance with neural activation as indexed by changes in blood oxygenation level-dependent (BOLD) signals. Task-free resting-state fMRI (RS-fMRI) provides an appealing alternative to characterize brain functions by scanning individuals while being in the scanner with no specific instructions except to relax and hold still (Bressler & Menon, 2010; Smitha et al., 2017). Intrinsic FC observed at rest refers to temporal correlations of spontaneous coherent low-frequency (0.01–0.1 Hz) BOLD fluctuations in distant grey matter regions, and provides a unique and stable profile for an individual’s intrinsic functional architecture of the brain (Fox & Raichle, 2007; Greicius et al., 2003; Lee and Shimony, 2013).

Using a variety of MRI scanners (e.g., vendors, field strengths) and analysis techniques (e.g., seed methods, independent component analysis, clustering), RS studies have consistently reported that anatomically separated, but functionally linked, brain regions show a high level of ongoing FC during rest (Fox & Raichle, 2007; van den Heuvel & Pol, 2010). Such networks are called RS networks and several functionally linked subnetworks have been identified, including CON, DMN, and FPN (Menon, 2011). Those networks are stable across time and strikingly similar to the networks activated by a broad spectrum of task-based fMRI studies (Bressler & Menon, 2010; Menon, 2011; Smitha et al., 2017). Furthermore, RSFC reflects the intrinsic organizations of the brain and acts as a neural “fingerprint” that can account for variability in task-evoked activity and behavioral propensity (Dubois & Adolphs, 2016; Finn et al., 2015; Fox et al., 2005).

Task-based and task-free fMRI not only reveal universal principles of brain organization but also neurodiversity of brain-behavior relationships. A recognized method of examining individualized predictions of brain-behavior relationships is referred to as prediction-analytics framework, which relates whole-brain RSFC patterns to phenotypic variables, such as costly punishment via multivariate prediction analyses (i.e., multivariate regression) (Kelly et al., 2012). This approach is sensitive enough to detect subtle effects in the brain that cannot be achieved by traditional univariate methods (Dosenbach et al., 2010; Orru et al., 2012; Yarkoni & Westfall, 2017). Multivariate prediction analyses have been successfully applied to RS studies to identify significant biomarkers of cognitive impairment and certain mental disorders (Chen et al., 2011; Orru et al., 2012; Shen & Hu, 2010). Furthermore, it has been widely used to establish the relationship between RSFC and individual variability of social behaviors (Alavash et al., 2018; Bellucci et al., 2018; Feng et al., 2018; Rosenberg et al., 2016; Seeley et al., 2007).

Most of the existing fMRI studies on the neural signatures of costly punishment have focused on examining group-level brain activities. While these group-averaging findings provide valuable insights into the neural signatures of costly punishment, they lack detailed characterization of individual-specific brain systems that are predictive of individual-level costly punishment behaviors (Gordon et al., 2017). Only one RS-fMRI study to date has addressed this issue by applying multivariate prediction analyses to distinguish two groups of individuals with different propensities to punish (Feng et al., 2018). However, this study focused on second-party punishment (SPP) and applied a multivariate classification method by artificially converting continuous punishment scores into a categorical variable. In contrast to this methodological approach, a multivariate regression appears more apt to predict individual neurodiversity of costly punishment behaviors.

The current study investigated the prediction of individual differences in TPP propensity based on RSFC networks by implementing a prediction-analytics framework. RSFC offers a reliable neural index to determine sources of behavioral heterogeneity in costly punishment and has been successfully applied to predict individual variances in SPP. However, it is not known whether those domain-general large-networks at RS, which have been proved to predict individualized SPP, are equally capable of predicting individualized TPP propensity.

Before undergoing task-free fMRI, participants completed economic TPP game outside the scanner deciding on how much to punish players based on their proposed unfair offers. On the behavioral level, we expected a linear increase of TPP punishment due to the increasing unfairness of offers and calculated the average TPP across different levels of unfairness as a measure of individuals’ TPP propensity across norm-violating scenarios. On the neural level, we hypothesized that individual differences in average TPP are predicted by RSFC within the FPN since the FPN plays a decisive role in determining the final punishment (Krueger & Hoffman, 2016).

Methods

Participants

Forty-four healthy volunteers (23 females, 21 males, mean ± standard deviation: m = 23.6 ± 3.3)—enrolled either from the University of Mannheim or the University of Heidelberg in Germany—participated for financial compensation in this study. All participants were right-handed and had a normal or corrected-to-normal vision. The local ethics committee approved the study, which was conducted by the ethical guidelines and principles of the Declaration of Helsinki. Participants gave written consent after being informed about the procedure of the study: they were informed that their participation was voluntary, and they could drop out at any time point during the study. For completing the two sessions, participants received €25 as their financial compensation (€10 for the behavioral and €15 for the scanning session). In addition, they could earn additional money based on their decision of a randomly chosen economic game (with an exchange rate of 1 monetary unit [MU] = 10 cents).

Paradigm

Each participant was required to complete two types of economic games: first a single-short dictator game (DG) to create a backup story for this study followed by a series of TPP games. For the DG, both players received 6 MUs and the proposer received additional 12 MUs to split with the other player. Participants were assigned to the role of a proposer and asked to provide the initials of their names (or any 2 letters) after the game. They were further informed that the information of the interaction—including the initials and the decision for this game—would be recorded in the database used for other participants in later TPP games.

For the TPP, participants witnessed a series of DGs in a random order (just like the ones that they had completed before)—consisting of seven offers ranging from fair to unfair (6:6; 7:5; 8:4; 9:3; 10:2; 11:1; 12:0) from different proposers identified with different initials. Participants were assigned to the roles of third parties and received 6 MUs as an endowment per game. They could decide either to keep all MUs or punish proposers using integer MUs sizes (i.e., 0, 1, 2, …, 6) from their endowment based on the observed offers. The MUs used for punishment by the third parties were tripled in amount (Fehr & Fischbacher, 2004b; Henrich et al., 2006) and reduced the proposers’ payoffs. For example, if a proposer decided to keep 12 MUs (besides the initial 6 MUs) and participants as third-parties decided to spend all their 6MUs, the payoffs of both the proposers and the third parties would be brought down to zero.

Procedure

This experiment was part of a comprehensive study investigating the neuropsychological signatures of human cooperative behavior. Participants needed to attend two sessions at two different times within 2 weeks for the reported experiment. In the first session, participants were asked to fill out demographic surveys and to complete the DG and TPP games. In the second session, they completed a structural MRI and an RS-fMRI scan, each lasting approximately 10 minutes, as well as a task-fMRI scan. During the RS scan, participants were instructed to keep their eyes on the crosshair displayed on a screen and not to fall asleep. Here, only the data of the TPP games and the RS- fMRI are reported (the results for the task-fMRI experiment will be reported in a different publication). At the end of the study, participants were asked about their beliefs about the experimental setup filling out a debriefing questionnaire. No participant expressed doubt as to whether real people were involved in the study, and their final payoffs also depended on their decisions in the DG and TPP game.

Behavioral Data Analyses

The behavioral data analyses were carried out using SPSS 21.0 (IBM Corp 2012) with an alpha set to p < 0.05 (two-tailed). To identify the effects of variations in the fairness of offers, a one-way repeated-measures analysis of variance (ANOVA) on TPP behaviors was performed with Offer (6:6, 7:5, 8:4, 9:3, 10:2, 11:1, 12:0) as a within-subjects factor. To identify whether average punishment across all offers could be used as an index of TPP propensity across norm-violating scenarios, we calculated Pearson correlations between the average TPP and the punishment in the relatively fair condition (6:6 and 7:5 offers), moderately unfair condition (8:4 and 9:3 offers), and highly unfair condition (10:2, 11:1, and 12:0 offers).

MRI Data Acquisition

Brain images were collected using a Siemens Trio-3T MRI scanner (Siemens Medical Systems, Erlangen, Germany) with a 32-channel head coil. For each participant, a high-resolution anatomical scan of the entire brain was collected using a T1-weighted 3D magnetization prepared rapid acquisition with gradient echo (MP-RAGE) sequence (time of repetition (TR), 2,300 ms; echo time (TE), 3.03 ms; flip angle, 9°; number of slices, 192; field of view (FOV), 256 mm; matrix size, 256 x 256; voxel size, 1 x 1 x 1 mm. The blood oxygen level-dependent (BOLD) signal for functional images was measured using a T2-weighted gradient echo-planar imaging (EPI) sequence with the following parameters: TR, 2,000 ms; TE, 30 ms; flip angle, 80°; thickness, 3 mm; number of slices, 36; FOV, 192 mm; matrix size, 64 x 64 mm; voxel size, 3 x 3 x 3 mm. The first five scans of the EPI were discarded to minimize T1 effects.

Neuroimaging Preprocessing

The MRI data analysis was performed using the Statistical Parametric Mapping Toolbox (SPM 12, Wellcome Trust Centre for Neuroimaging, London, UK; http://www.fil.ion.ucl.ac.uk/spm/software/spm12/) running under Matlab R2018a (Mathworks Inc., Natick, MA). Pre-processing of the EPI images included the following steps. To allow signal equilibrium, the first ten volumes of functional images were discarded. Afterward, the functional images were bias-corrected for field inhomogeneity. Subsequently, the slice-timing method was performed on functional images, using interpolation to correct the time point of each functional image. Next, all functional images were aligned to the mean image for movement correction to reduce the artifact during the imaging. Functional images were then co-registered to their anatomical images, and both anatomical images and functional images were spatially normalized to the MNI template through deformation fields derived from anatomical segmentation (resampling voxel size was 2 × 2 × 2 mm3). Lastly, functional images were smoothed with isotropic Gaussian kernel of 4 x 4 x 4 mm3 full width at half maximum (FWHM) to increase the ratio of signal-to-noise.

The artifact of functional images was detected and excluded using the ART toolbox (https://www.nitrc.org/projects/artifact_detect/) based on the following criteria. First, head displacement in x, y, or z-direction greater than 2 mm from the previous frames was rejected. Second, rotational displacement greater than 0.02 radians from the previous frame was rejected. Third, compared with the intensity of the mean image, global mean intensity in the functional images greater than three standard deviations was rejected. Subsequently, those outliers were included as nuisance regressors within the first-level general linear model. Ultimately, high-frequency noise and linear drift artifacts were removed by implementing the band-pass filter (0.01-0.1 Hz).

Resting-State Functional Connectivity Analyses

The Resting-State Functional Connectivity (RSFC) between regions of interests (ROIs) was determined based on the Dosenbach atlas, consisting of 142 ROIs (i.e., nodes) and belonging to five RSFC networks: (i) cingulo-opercular network (CON), (ii) sensorimotor network (SMN), (iii) default-mode network (DMN), (iv) frontoparietal network (FPN), and (iv) occipital network (OccN) (Dosenbach et al., 2010).

For each participant, the BOLD signal of all voxels residing in every ROI was averaged. Between each pair of ROIs, the network edge representing the FC between both ROIs was calculated by the Pearson correlation coefficient using the Functional Connectivity (CONN) toolbox (https://www.nitrc.org/projects/conn). Pearson’s correlation coefficients were transformed into Fisher’s z values to represent the degree of FC. For each participant, a correlation matrix with 10,011 unique connections between each pair of the 142 ROIs was created. Among the resulting 10,011 unique connections, within-network connectivity values of the five cortical RSFC networks were entered as features in the multivariate predictive model with TPP propensity as the target.

Multivariate Prediction Analyses

To predict variations in participants’ average TPP, a support vector regression (SVR) was implemented for training and testing the prediction models—using fitrsvm in Matlab (https://mathworks.com). The algorithm was trained on intra-network functional connectivity values of 5 RSFC networks as defined by Dosenbach et al. (2010). Specifically, the correlation values between ROIs within each specific RSFC network were entered into multivariate regression models as features to predict participants’ TPP propensity.

A leave-one-subject-out cross-validation (LOSOCV) approach was implemented for each prediction model. In every iteration, the SVR was trained on all but one subject and predictive performance was tested on the left-out subject. The procedure was repeated iteratively n times, with n being the total number of iteration (i.e., subjects in our sample). This yielded a total of n model-based prediction (one for each subject) and their accuracy was tested by comparing them against the true punishment level of each participant. The standardized mean squared error (smse) was computed to assess the cross-validated model performance (i.e., the average error of the model’s predictions divided by the targets’ variance)—providing an indicator of the model’s accuracy and prediction quality for cross-dataset comparisons (Poldrack and Varoquaux, 2020).

To test the significance of the estimated smse of the model trained on true targets, a permutation test with 10,000 cross-validated permutations was implemented. In each permutation, the cross-validation above described was repeated by training the model with randomly permuted targets, yielding a random distribution of smse values that represented our reference distribution to evaluate the significance of the predictions made by the true model trained on the true targets. The probability that the true model generated its prediction by chance given the observed data (D) was computed by estimating the probability that the same model trained on permuted targets performed better (had lower smse values) than the true model as follows:

$$ P\left({M}_{perm}<{M}_{true}|D\right)=\frac{1+{N}_{perm}}{1+{N}_{sample}}, $$

where Mtrue is the performance of the true model, Mperm is the performance of the permuted models, Nperm is the total number of permuted models performing better (with lower smse values) than the true model, and Nsample is the total number of permutations (i.e., 10,000). Finally, the actual test statistics (i.e., 1) was added to the equation.

As LOSOCV has the danger of inflated results in small samples (Poldrack et al., 2020), the same procedure was run with a more conservative 11-fold cross-validation method.

Results

Behavioral Results

The means and standard errors of mean for the average punishments across offers are displayed in Fig. 1. A one-way repeated-measures ANOVA revealed a main effect for Offer (F(6,258) = 30.95, p < 0.001). Within the ANOVA, tests of within-subjects contrasts showed that punishment increased linearly from the fairest offer (i.e., 6:6) to the most unfair offer (i.e., 12:0) (F(1, 43) = 59.01, p < 0.001). Moreover, average TPP across norm-violating scenarios was significantly correlated with the average punishment magnitudes for relatively fair (6:6 and 7:5, r = 0.695, p < 0.001), moderately unfair (8:4, 9:3,10:2, r = 0.954, p < 0.001), and highly unfair (10:1, 11:1, 12:0, r = 0.927, p < 0.001) offers (see supplementary materials for the correlation between average punishment across all offers and punishment in each of the seven offers).

Fig. 1
figure 1

Behavioral results for costly punishment. Average punishment magnitudes (MUs) for each level of unfairness. Punishment magnitudes increased linearly from fair to unfair offers. Error bars: ± standard error mean.

Multivariate Prediction Analyses Results

A prediction-analytics framework (i.e., machine-learning algorithm) was applied to predict participants’ average TPP based on functional connectivity of 5 RSFC networks using the Dosenbach’s atlas (Dosenbach et al., 2010). The predictive analysis yielded a significant cross-validated prediction of average third-party punishment for the FPN model (FPN: smse =1.03, p = 0.014). On the contrary, none of the other RSFC network models obtained a significant prediction of individual TPP behavior (CON: smse = 1.08, p = 0.199; DMN: smse = 1.59, p = 0.812; SMN: smse = 1.12, p = 0.141; OccN: smse = 1.819, p = 0.785) (Fig. 2).

Fig. 2
figure 2

Multivariate analysis for prediction performances of third-party punishment (TPP). A. Workflow of the multivariate analysis. Regions of interest (ROI = 142) were chosen and partitioned into five functional connectivity networks. Pearson correlation values representing RSFC were computed between ROIs and extracted to build network-specific correlation matrices. These correlation values were entered into multivariate regression models as features to predict participants’ third-party punishment behavior as targets. The support vector regression (SVR) algorithm was implemented to make out-of-sample predictions using a cross-validated approach. Prediction performance was finally tested against distribution of predictions based on SVR models trained with randomly permutated targets (permutation test). B. Results for prediction of TPP. The significance of performance (i.e., standard mean square error, smse, red dot line) for the five cross-validated prediction models for third-party punishment are shown based on the implemented prediction framework. L, left; R, right; RS-fMRI, resting-state functional magnetic resonance imaging; RSFC, resting-state functional connectivity; SVR, support vector regression; smse, standard mean squared error: CON, cingulo-opercular network; SMN, sensorimotor network; FPN, frontoparietal network; DMN, default-mode network; Occn, Occipital network

Using a more conservative 11-fold cross validation procedure, the main findings for the LOSOCV analyses were validated. The FPN model remained significant in predicting the propensity for TPP (FPN: smse = 1.04, p = 0.017), whereas the other RSFC network models did not reach significance (CON: smse = 0.99, p = 0.079; DMN: smse = 1.68, p = 0.887; SMN: smse = 1.03, p = 0.060; OccN: smse = 2.00, p = 0.903).

Furthermore, the quality of the networks’ prediction was compared. A one-way ANOVA revealed a significant effect for both cross-validation approaches (LOSOCV: F(4,172) = 2.76, p = 0.029; 11-fold: F(4,172) = 4.93, p = 0.0009), indicating a significant difference in the predictions across models. In particular, individual punishment behaviors based on the FPN were significantly better (had smaller errors) than the average punishment predictions of all other networks (LOSOCV: t(43) = −2.79, p = 0.008; 11-fold: t(43) = −2.90, p = 0.006). Assuming a significance threshold of p < 0.01 after correcting for multiple comparisons across the five RSFC networks, FPN predictions were significantly better than predictions of the DMN (LOSOCV: t(43) = −2.89, p = 0.006; 11-fold: t(43) = -3.64, p = 0.0007) and OccN models (LOSOCV: t(43) = −2.79, p = 0.008; 11-fold: t(43) = −3.21, p = 0.003). On the contrary, despite showing the same trend, predictive differences between FPN and CON (LOSOCV: t(43) = −1.06, p = 0.295; 11-fold: t(43) = −0.75, p = 0.459) and between FPN and SMN (LOSOCV: t(43) = −0.965, p = 0.340; 11-fold: t(43) = −0.50, p = 0.620) did not reach significance.

Discussion

Costly punishment enables human societies to maintain stable cooperation not only within genetic relatives but also among unrelated strangers on a large scale (Fehr & Gächter, 2002; Henrich et al., 2006). Combining economic exchange games with task-free fMRI and multivariate prediction analyses, our study investigated the prediction of interindividual differences in TPP propensity based on RSFC within large-scale networks (FPN, CON, SMN, DMN, and OccN). We showed that the amount of punishment increased with more unfair offers and that individual differences in the propensity to punish with a cost as an unaffected third-party could be predicted by RSFC within the FPN.

At the behavioral level, we found a linear increase in the amount of TPP with an increase from fair to unfair offers—consistent with previous evidence (Civai et al., 2019; Fehr & Fischbacher, 2004b; Pedersen and McCullough, 2018; Stallen et al., 2018). This linear increase of punishment in response to different levels of unfairness is not surprising, as it has been well-demonstrated that the amount of TPP imposed on norm violators is a function of the severity of the violation (Henrich et al., 2006). This linear punishment pattern might be due to more negative emotional responses (e.g., anger) elicited by the increasingly unfair offers (Fehr & Fischbacher, 2004b; Stallen et al., 2018).

However, it also is possible that people varied their punishment decisions due to some reference effects. People might use the increasing unfairness to calibrate their punishment decisions, because this is the only feature that varied across the rounds of games (Pedersen and McCullough, 2013). Therefore, it is problematic to use responses to the highly unfair condition in which proposers shared no or only a small amount of resources with the receivers to measure costly punishment behavior as previous studies did (Feng et al., 2018; Knoch et al., 2010). Here, we used the average punishment across a range of offers (from fair to unfair) as an index of propensity for TPP. In fact, we found that the average TPP was highly correlated with punishment magnitudes at different levels of unfairness, indicating that people with higher propensity to punish indeed punished more harshly across norm-violating scenarios. Therefore, we argue that the propensity for TPP reflects a relatively stable trait of an individual to intervene as a third-party in various social contexts.

At the neural level, we found that individual differences in TPP propensity were predicted by the RSFC within the FPN. This finding provides further support that individual differences in social preferences can be captured by the brain based characteristics. Individuals display a great heterogeneity of social preferences, with some being more selfish while others being more altruistic (Morishima et al., 2012; Nash and Knoch, 2014). The striking individual differences in social preferences cannot be explained simply by demographic or psychological variables and may in fact originate from our neural traits reflected by a variety of indexes, such as structural magnetic resonance imaging (MRI), resting-state electroencephalography, resting-state brain activities, and connections (Baumgartner et al., 2013; Nash and Knoch, 2014). For example, task-free RSFC predicts inter-individual differences in the propensity for trust and reciprocity (Bellucci et al., 2018; Lu et al., 2019) as well as for SPP (Feng et al., 2018).

Our research revealed the intrinsic FC within the FPN as a critical neural index predictive of individual differences in the propensity for punishment as an unaffected third-party. Only a few studies have examined the underlying neural underpinnings that represent individual differences of human costly punishment. In a pioneering study, Knoch et al. (2010) used resting electroencephalography as a measure of intrinsic cortical activity and demonstrated that task-free baseline cortical activity in the right lateral prefrontal cortex predicts individual differences of acceptance rates of unfair offers in the ultimatum game, emphasizing the role of self-regulation function in shaping costly punishment. In a more recent study, Feng et al. (2018) combined RSFC with a machine-learning approach to successfully classify individuals with different punishment propensities in the ultimatum game, revealing that the most discriminative features contributing to the classification are the regions from three large-scale RS networks: FPN, DMN, and CON. For the first time, our study provided evidence that the intrinsic functional organization exhibited by the FPN in the absence of any explicit tasks serves as an important neural index for the heterogeneity in TPP propensity.

Evidence showed that the RSFC within the FPN is essential in supporting higher-order goal-directed behaviors, and the FPN intra-network connectivity has been suggested to be an indicator of cognitive performance (Dosenbach et al., 2007; Sheffield et al., 2015; Vincent et al., 2008). The intrinsic network architecture during resting state highly corresponds to the brain’s functional network architecture during task states (Cole et al., 2014). Recent task-based fMRI research has consistently demonstrated the critical roles of the three large-scale networks (CON, DMN, and FPN) in TPP (Bellucci et al., 2017; Buckholtz et al., 2008; Buckholtz et al., 2015; Ginther et al., 2016; Krueger & Hoffman, 2016; Schleim et al., 2011; Treadway et al., 2014; Yang et al., 2019; Zhong et al., 2016). In particular, the FPN plays a special role in enabling the enforcement of TPP by integrating and coordinating different streams of information (Buckholtz & Marois, 2012; Krueger & Hoffman, 2016). However, previous task-based fMRI studies on the FPN associated with TPP focused more on the task-evoked activation of specific regions, such as the dorsolateral prefrontal cortex (dlPFC). For example, the dlPFC is argued to be engaged in resolving the conflict between monetary self-interest and fairness-related social norm enforcement (Strang & Park, 2017; Yang et al., 2019). Alternatively, the dlPFC is argued to be involved in constructing a scale for punishment and selecting an appropriate punishment that fits the norm violation (Buckholtz & Marois, 2012; Krueger & Hoffman, 2016). However, our study focused on the role of the spontaneous functional connectivity between all the brain regions within the FPN in uncovering the source of heterogeneity in the propensity to punish as a third-party. We extend previous finding by revealing that the RSFC of the FPN may serve as an important neural index for identifying individual-level characteristic patterns in response to unfairness as a third-party.

It needs to be noted that although the FPN remained the only significant network in predicting TPP propensity and that the predictions of individual TPP propensity based on the FPN outperformed the average punishment predictions of all other networks, the CON and SMN did not differ significantly from the FPN in making prediction about individual TPP propensity.

The CON has been typically identified to be engaged in norm-violation detection during costly punishment (Krueger & Hoffman, 2016). The RSFC of brain regions from the CON was able to discriminate individuals with different SPP propensities (Feng et al., 2018). The SMN has been typically recognized to be recruited in sensorimotor function, including hand or finger tapping (Biswal, Yetkin, Haughton, & Hyde, 1995; Seidler et al., 2015). It also has been suggested to be involved in number processing due to the widely used finger counting strategies during the acquisition of numerical skills in our childhood (Tschentscher, Hauk, Fischer, & Pulvermuller, 2012). Recent evidence showed that the intrinsic structural and function features of regions in the SMN are able to predict individual differences in trust propensity, implicating the role of calculating cost and benefits during social decision making in economic games (Feng et al., 2021). The RSFC within the CON and SMN could potentially be involved in representing individual TPP propensity, because they might be implicated in the capacities of signaling norm violations (CON) and performing cost-benefit calculations (SMN) in costly punishment. However, because neither networks’ predictive performance reached significance, their contribution to the predictions about the individual TPP propensity should be further tested in future studies with larger sample sizes.

Despite the novel contributions of our study, several limitations need to be addressed. First, we used multivariate prediction analyses enabling out-of-sample generalization between RSFC and TPP behavior in a small student sample at one time point, but future studies are needed to replicate our findings with a larger sample size across a more heterogenous samples and across different time points.

Second, we used a standard economic TPP game in which participants were assigned to the role of an observer who could reduce the money of the proposer. The experimental demand for punishment in this paradigm might yield overestimates of individuals’ propensity to punish strangers for being proposing unfair offers to others. Previous studies showed that when people are given a reward option in addition to the punishment, their TPP level significantly decreases (Pedersen et al., 2013). Another methodological limitation of this paradigm is that we cannot completely exclude the influence of reference effects on punishment decisions by using a range of offers from fair to unfair. Although the randomized order of the offers may partially control this effect, it remains an open question whether the linear change of punishment is due to participants’ responses to increased unfairness or the perceptual features that varied across choices. What should also be noted is that the MUs used for punishment by the third parties were tripled in amount to reduce the proposers’ payoffs—a punishment ration typically used in previous studies (Fehr & Fischbacher, 2004b; Henrich et al., 2006). If participants spent all of their 6 MUs to punish proposers who kept all of the additional 12 MUs, the payoff of the proposers would be brought down to zero, which could be considered as the most serious punishment. However, in this case because the receivers were not affected by the decision and kept 6MUs in the end, some participants might consider this as not fair and thus might be not willing to implement the severest punishment. Future studies should recognize these methodological limitations and improve the paradigm accordingly. Additionally, we did not control for fairness judgment and therefore the relationship between the RSFC of the FPN and TPP propensity could be potentially confounded by how fair an individual perceives an offer to be, despite our usage of average punishment as an index for individual TPP propensity.

Third, although we manipulated the severity of unfairness offers in the experiment, we did not manipulate the level of intention. Future studies are needed to verify whether the same neural index remains significant in predicting TPP propensity in the contexts where the intention varies.

Finally, both economic punishment games and hypothetical criminal vignettes paradigms are commonly employed in studying TPP. Whereas economics game paradigms have the advantage of giving the participants real incentives to engage in costly punishment behaviors, hypothetical criminal vignettes paradigms are more morally and legally salient and can be easily varied by intention and outcome. Previous research always chose one or the other but has never directly compared the neural signatures of the costly punishment between the two paradigms. Future studies must verify that RSFC networks predicting individual differences of costly punishment in economic game paradigms are similar to those predicting costly punishment in hypothetical criminal vignettes paradigms.

Despite these limitations, our study provides novel and essential evidence on the ability to predict individual differences in the propensity for costly punishment in the context of TPP from RSFC. In summary, our results showed RSFC within the FPN predicted individual differences in TPP propensity—supporting evidence for domain-general large-scale networks of costly punishment (Krueger & Hoffman, 2016). Our findings help to identify RSFC brain networks for predicting individual differences in the propensity of costly punishment and deepen our understanding of why people punish differently under various social contexts.