Warmth and competence in human-agent cooperation

Interaction and cooperation with humans are overarching aspirations of artificial intelligence (AI) research. Recent studies demonstrate that AI agents trained with deep reinforcement learning are capable of collaborating with humans. These studies primarily evaluate human compatibility through"objective"metrics such as task performance, obscuring potential variation in the levels of trust and subjective preference that different agents garner. To better understand the factors shaping subjective preferences in human-agent cooperation, we train deep reinforcement learning agents in Coins, a two-player social dilemma. We recruit $N = 501$ participants for a human-agent cooperation study and measure their impressions of the agents they encounter. Participants' perceptions of warmth and competence predict their stated preferences for different agents, above and beyond objective performance metrics. Drawing inspiration from social science and biology research, we subsequently implement a new ``partner choice'' framework to elicit revealed preferences: after playing an episode with an agent, participants are asked whether they would like to play the next episode with the same agent or to play alone. As with stated preferences, social perception better predicts participants' revealed preferences than does objective performance. Given these results, we recommend human-agent interaction researchers routinely incorporate the measurement of social perception and subjective preferences into their studies.


Introduction
Trust is central to the development and deployment of artificial intelligence (AI) [41,89].However, many members of the public harbor doubts and concerns about the  The human can leverage their impression to decide whether to continue or discontinue the interaction (partner choice).trustworthiness of AI [15,21,25,44].This presents a pressing issue for cooperation between humans and AI agents [18].
Algorithmic development research has been slow to recognize the importance of trust and preferences for cooperative agents.Recent studies show that deep reinforcement learning can be used to train interactive agents for human-agent collaboration [14,55,91,92].The "human-compatible" agents from these experiments demonstrate compelling improvements in game score, task accuracy, and win rate over established benchmarks.However, a narrow focus on "objective" metrics of performance obscures any differences in subjective preferences humans develop over cooperative agents.Two agents may generate similar benefits in terms of typical performance metrics, but human teammates may nonetheless express a strong preference for one over the other [87,91].Developing human-compatible, cooperative agents will require evaluating agents on dimensions other than objective performance.
What shapes subjective preferences for artificial agents, if not a direct mapping of agent performance?One possible source of variance is social perception.When encountering a novel actor, humans rapidly and automatically evaluate the actor along two underlying dimensions: warmth and competence [1,29,28,58].These perceptions help individuals "make sense of [other actors] in order to guide their own actions and interactions" [27] (Figure 1).The competence dimension aligns with the established focus on performance and score in machine learning research [10]: How effectively can this actor achieve its interests?Appraising an actor's warmth, on the other hand, raises a novel set of considerations: How aligned are this actor's goals and interests with one's own?Research on social cognition consistently demonstrates that humans prefer others who are not only competent, but also warm [1,30].Hence, we predict that perceived warmth will be an important determinant of preferences for artificial agents.
Here we run behavioral experiments to investigate social perception and subjective preferences in human-agent interaction.We train reinforcement learning agents to play Coins, a mixed-motive game, varying agent hyperparameters known to influence cooperative behavior and performance in social dilemmas.Three coplay experiments then recruit human participants to interact with the agents, measure participants' judgments of agent warmth and competence, and elicit participant preferences over the agents.
Experiments evaluating human views of agents frequently rely on stated preferences, typically by directly asking participants which of two agents they preferred as a partner [22,87,91].Such self-report methods can be insightful tools for research [68].However, they exhibit limited ecological validity and are vulnerable to experimenter demand [72].In this paper, we overcome these challenges by eliciting revealed preferences [78]: Do people even want to interact with a given agent, if given the choice not to?Partner choice, or the ability to leave or reject an interaction, is a well-established revealed-preference paradigm in evolutionary biology and behavioral economics [5,6,12,88].While studies that measure revealed preferences (e.g., [49,73]) are not inherently immune to experimenter demand, partner-choice measures can mitigate demand effects when participants are compensated based on their performance ("incentivized choice"; see [72]).Partner choice also carries external validity for interaction research: in the context of algorithmic development, we can view partner choice as a stand-in for the choice to adopt an artificial intelligence system [8,20,67].Users may test out several suggestions from a recommender system before deciding whether or not to rely on it for future decisions.Similarly, commuters might tentatively try several rides from self-driving cars to help choose whether to transition away from traditional driving.Overall, partnerchoice study designs empower participants with an ability to embrace or leave an interaction with an agent-and thus incorporate an ethic of autonomy [9,41,61] into human-agent interaction research.
In summary, this paper makes the following contributions to cooperative AI research: 1. Demonstrates the use of reinforcement learning to train human-compatible agents for a temporally and spatially extended mixed-motive game.2. Connects human-AI interaction research to frameworks from psychology and economics, identifying tools that researchers can easily import for their own studies.3. Illustrates how social perceptions affect stated and revealed preferences in incentivized interactions with agents, above and beyond standard performance metrics.

Task
Coins [31,33,51,71] is a mixed-motive Markov game [54] played by n = 2 players (Figure 2; see also Appendix A for full task details).Players of two different colors occupy a small gridworld room with width w and depth d.Coins randomly appear in the room throughout the game, spawning on each cell with probability P .Every coin matches one of the two players in color.On each step of the game, players can stand still or move around the room.
The goal of the game is to earn reward by collecting coins.Players pick up coins by stepping onto them.Each coin collection generates reward for both players as a function of the match or mismatch between the coin color and the collecting player's color.Under the canonical rules (Table 1), a player receives +1 reward for picking up a coin of any color.If a player collects a coin of their own color (i.e., a matching coin), the other player is unaffected.However, if a player picks up a coin of the other player's color (i.e., a mismatching coin), the other player receives −2 reward.In the short term, it is always individually advantageous to collect an available coin, whether matching or mismatching.However, this strategy will lower the score of the other player.Players achieve the socially optimal outcome by collecting only the coins that match their color.Two properties make Coins an ideal testbed for investigating perceptions of warmth and competence.First, as a consequence of its incentive structure, Coins is a social dilemma [48]: players can pursue selfish goals or prosocial goals.Second, relative to matrix games like the Prisoner's Dilemma, Coins is temporally and spatially extended [50]: players can employ a variety of strategies to achieve their goals, with differing levels of efficiency.We hypothesize that these two features offer sufficient affordance for an observer to infer other players' intentions and their effectiveness at enacting their intentions [11,75,97].
Our experiments use a colorblind-friendly palette, with red, blue, yellow, green, and purple players and coins (Figure A1a).During agent training, we procedurally generate rooms with width w and depth d independently sampled from U {10, 15}.Coins appear in each cell with probability P = 0.0005.Episodes last for T = 500 steps.Each episode of training randomly samples colors (without replacement) for agents.
In our human-agent interaction studies, co-play episodes use w = d = 11, P = 0.0005, and T = 300.Player colors are randomized across the five players (one human participant and four agent co-players) at the beginning of each study session, and held constant across all episodes within the session.
In Study 1, humans and agents play Coins with the canonical rules.In Studies 2 and 3, humans and agents play Coins with a slightly altered incentive structure.Each outcome increases by +2 reward, making all rewards in the game nonnegative (Table 2).Since all rewards are offset by the same amount, this reward scheme preserves the social dilemma structure in Coins.

Agent design and training
We leverage deep reinforcement learning to train four agents for our human-agent cooperation studies (see Appendix B for full agent details).Overall, our study design and measurement tools are agnostic to the algorithmic implementation of the agents being evaluated.For this paper, the agents learn using the advantage actor-critic (A2C) algorithm [62].The neural network consists of a convolutional module, a fully connected module, an LSTM with contrastive predictive coding [39,66], and linear readouts for policy and value.Agents train for 5 × 10 7 steps in selfplay, with task parameters as described in Section 2.1.We consider two algorithmic modifications to the agents to induce variance in social perception.First, we build the Social Value Orientation (SVO) component [59], an algorithmic module inspired by psychological models of human prosocial preferences [36,52,63], into our agents.The SVO component parameterizes each agent with θ, representing a target distribution over their reward and the reward of other agents in their environment.SVO agents are intrinsically motivated [86] to optimize for task rewards that align with their parameterized target θ.For these experiments, we endow agents with the "individualistic" value θ = 0 • and the "prosocial" value θ = 45 • .
Second, we add a "trembling hand" [17,81] component to the agents for evaluation and co-play.With probability ϵ, the trembling-hand module replaces each action selected by the agent with a random action.This component induces inefficiency in maximizing value according to an agent's learned policy and value function.For these experiments, we apply the "steady" value ϵ = 0 and the "trembling" value ϵ = 0.5.
high warmth, high warmth, high competence low competence Table 3: Predictions for social perception (warmth and competence) as a function of agent hyperparameters (Social Value Orientation θ and trembling hand ϵ).
Table 3 summarizes the hyperparameter values and predicted effects for the four evaluated agents.

Study design for human-agent studies
We recruited participants from Prolific [70,69] for all studies (total N = 501; 47.6% female, 48.4% male, 1.6% non-binary or trans; m age = 33, sd age = 11).We received informed consent from all participants across the three studies.In each study, participants earned a base level of compensation for completing the study, with an additional bonus that varied as a function of their score in the task.Appendix C presents full details of our study design, including independent ethical review and study screenshots.
Overall, our studies sought to explore the relationship between social perception and subjective preferences in Coins.Study 1 approached these constructs using the canonical payoff structure for Coins [51] and an established self-report framework for eliciting (stated) preferences [91].This study framework leverages a "within-participants design", meaning that each participant interacts with all agents in the study.Within-participants designs increase the data points collected per participant and allow for better control of between-individual variance, thus improving statistical power for a given sample size.
We subsequently sought to understand whether the findings from Study 1 replicate under a partner choice framework.Does social perception exhibit the same predictive power for revealed preferences as it does for stated preferences?While within-participants designs offer several statistical advantages, they are not ideal for studying partner choices.Exposing participants to multiple potential partners can introduce order effects, where the exact sequence of interactions influences participants' responses.Within-participants designs may also fatigue participants, potentially compromising the quality of their responses as the study progresses.Consequently, participants' partner choices may be progressively less motivated by genuine preference and more influenced by extraneous factors.Thus, to study revealed preferences, we switched to a "between-participants design" in which each participant interacted with one randomly selected agent.Given that humans respond more strongly to losses than to commensurate gains [42], we made one additional change, testing participants' partner choices under a shifted incentive structure with all non-negative outcomes (Table 2).To isolate the effects of the change from stated to revealed preferences, we approached these changes in two stages.Study 2 used the same stated-preference approach and within-participants design as Study 1, but incorporated the offset incentive structure.Study 3 then elicited revealed preferences in place of stated preferences.
We tested the following hypotheses in our studies: H1.Social perception predicts participants' stated and revealed preferences.That is, human participants will prefer to play with agents they perceive as warm and competent.H2.Social perception predicts participants' stated and revealed preferences, above and beyond the scores that participants receive.That is, participants' social perceptions of agents will contribute to their preferences independently of the scores they receive when playing with the agents.H3.Social perception will correlate positively with the sentiment of participants' verbal impressions of the agents.That is, participants' social perceptions of agents will emerge as positive sentiment in participants' verbal descriptions of the agents.

Study 1
Our first study aimed to explore the relationship between social perception and stated preferences across the four agents.We recruited N = 101 participants from Prolific (45.5% female, 51.5% male; m age = 34, sd age = 13).The study employed a within-participants design: each participant encountered and played with the full cohort of co-players (i.e., all four agents).
At the beginning of the study, participants read instructions and played a short tutorial episode alone, without a co-player, in order to learn the game rules and payoff structure (Table 1).The study instructed participants that they would receive $0.10 for every point earned during the remaining episodes.Participants then played 12 episodes with a randomized sequence of agent co-players, generated by concatenating every possible combination of co-players.Each of these co-play episodes lasted T = 300 steps (1 minute).After every episode, participants rated how "warm", "well-intentioned", "competent", and "intelligent" the co-player from that episode was on five-point Likert-type scales (see Figure 3a).After every two episodes, participants reported their preference over the agent co-players from those episodes on a five-point Likert-type scale (see Figure 3b).In the experiment interface, we referred to the first agent co-player in each two-episode sequence as "co-player A".The interface similarly referred to the second agent in each two-episode sequence as "co-player B".Because the sequence of co-players was produced by concatenating all co-player combinations, each participant stated their preferences for every possible pairing of co-players.
After playing all 12 episodes, participants completed a short post-task questionnaire.The questionnaire first solicited open-ended responses about each of How warm was your co-player from that round?  the encountered co-players, then collected standard demographic information and open-ended feedback on the study overall.The study took 22.4 minutes on average to complete, with a compensation base of $2.50 and an average bonus of $7.43.

Study 2
Our second study tested the relationship between social perception and stated preferences under the shifted incentive structure for Coins ( As before, participants played 12 episodes with a randomized sequence of agent co-players, generated such that they rated and compared every possible combination of co-players.As in Study 1, the interface referred to the first agent co-player in each two-episode sequence as "co-player A" and to the second agent as "co-player B".The study took 23.2 minutes on average to complete, with a compensation base of $2.50 and an average bonus of $6.77.

Study 3
Our final study assessed whether the predictiveness of social perception extends to a revealed-preference framework.We recruited N = 301 participants from Prolific (51.3% female, 45.0% male, 1.7% non-binary; m age = 33, sd age = 11).In contrast with the preceding studies, Study 3 employed a between-participants design: each participant interacted with a single, randomly sampled agent.
The majority of the study introduction remained the same as in Study 2, with some instructions altered to inform participants they would play Coins with a single co-player (as opposed to multiple co-players, like in Studies 1 and 2).After reading the instructions and playing a short tutorial episode alone, participants played one episode of Coins with a randomly sampled co-player.After this episode, participants rated how "warm", "well-intentioned", "competent", and "intelligent" their co-player was on five-point Likert-type scales (see Figure 3a).Participants subsequently learned that they would be playing one additional episode, with the choice of playing alone or playing with the same co-player.Participants indicated through a binary choice whether they wanted to play alone or with the co-player (see Figure 3c).They proceeded with the episode as chosen, and then completed the standard post-task questionnaire.
The study took 6.2 minutes on average to complete, with a compensation base of $1.25 and an average bonus of $1.25.We evaluate agents with ϵ ∈ {0, 0.25, 0.5, 0.75, 1} to understand the effect of the trembling-hand module on agent behavior (Figures A5-A7).As expected, higher ϵ values degrade performance.Total coin collections decrease with increasing ϵ for both selfish and prosocial agents.Higher levels of ϵ cause prosocial agents to become less discerning at avoiding mismatching coins, and consequently produce lower levels of collective return.

Human-agent studies
In addition to the results and information presented here, Appendix C offers expanded explanations and full details of our statistical analyses.

Study 1
Participants played with each agent three times during the study, evaluating the relevant agent after each episode of play.Participants did not make judgments at random; their responses were highly consistent across their interactions with each agent (Table 4).At the same time, participants were not submitting vacuous ratings.Perceptions varied significantly as a function of which trait participants were evaluating, F 3,4744 = 96.2,p < 0.001.Table 4: Participants' evaluations of their co-players were highly consistent, as assessed by intraclass correlation coefficient (ICC) [84].ICC ranges from 0 to 1, with higher values indicating greater consistency.
Psychology research often employs composite measures to assess cognitive constructs (attributes and variables that cannot be directly observed).Combining multiple individual measures into composite scales can reduce measurement error and provide a more stable estimate of the latent construct underlying the scale [53,56].Following standard practice in social perception research [30], we computed two composite measures for further analysis.A composite warmth measure averaged participants' judgments of how "warm" and how "well-intentioned" their co-player was.A composite competence measure similarly combined individual judgments of how "competent" and "intelligent" each co-player was.Both composite measures exhibit high scale reliability as measured by the Spearman-Brown formula [23], with ρ = 0.93 for the composite warmth measure and ρ = 0.92 for the composite competence measure.
Social perception.As expected, the SVO and trembling-hand algorithmic components generated markedly divergent appraisals of warmth and competence.Participants perceived high-SVO agents as significantly warmer than low-SVO agents, F 1,1108 = 1006.8,p < 0.001 (Figure 5a).Similarly, steady agents came across as significantly more competent than trembling agents, F 1,1108 = 70.6,p < 0.001 (Figure 5b).Jointly, the algorithmic effects prompted distinct impressions in the warmth-competence space (Figure 6).
Stated preferences.How well do participants' perceptions predict subjective preferences, relative to predictions made based on objective score?We fit competing fractional response models to assess the influence of score and social   perception, respectively, on self-reported preferences.We then compared model fit using the Akaike information criterion (AIC) [3] and Nakagawa's R 2 [65].We fit an additional baseline model using algorithm identities (i.e., which two agents participants were comparing) as a predictor.The model leveraging algorithm identities and the model leveraging participant scores both accounted for a large amount of variance in subjective preferences (Table 5, top and middle rows).Participants exhibited a clear pattern of preferences across the four agents (Figure A19).In pairwise comparison, participants favored the θ = 45 • agents over both θ = 0 • agents, and the θ = 0 • , ϵ = 0.5 agent over the θ = 0 • , ϵ = 0 agent.The score model indicated that the higher a participant scored with co-player A relative to co-player B, the more they reported preferring co-player A, with an odds ratio OR = 1.12, 95% CI [1.11, 1.13], p < 0.001.
Nevertheless, knowing participants' judgments generates substantially better predictions of their preferences than the alternatives (H1; Table 5, bottom row).Both perceived warmth and perceived competence contribute to this predictiveness (Figure 7a).The warmer a participant judged co-player A relative to co-player B, the more they reported preferring co-player A, OR = 2.23, 95% CI [2.08, 2.40], p < 0.001 (Figure 7b).Unexpectedly, the more competent co-player A appeared relative to co-player B, the less participants tended to favor co-player A, OR = 0.78, 95% CI [0.73, 0.84], p < 0.001.
As a further test of the predictive power of participants' social perceptions, we fit another regression with perceived warmth and competence as predictors, this time including score as a covariate (i.e., controlling for the effect of score).Score significantly and positively predicts preference in this model, p < 0.001 (Figure A20).Even so, the effects of warmth and competence remain significant, with p < 0.001 and p = 0.012, respectively.Among these three predictors, perceived warmth exhibits the largest effect on co-player preferences.That is, it provides  The difference in participants' evaluations of the warmth of co-player A over coplayer B significantly correlates with their stated relative preference for co-player A, p < 0.001.Perceived competence exhibits a similar (significant) relationship with preferences, p < 0.001.(a) and (b) depict odds ratios and preference predictions, respectively, from a fractional-response regression.Error bars and bands represent 95% confidence intervals.a substantial independent signal alongside score and perceived competence when used to predict stated preferences.Social perception thus improves model fit above and beyond that provided by score alone (H2).
Impression sentiment.As a supplementary analysis, we explore the openended responses participants provided about their co-players at the end of the study.For the most part, participants felt they could recall their co-players well enough to offer their impressions through written descriptions: in aggregate, participants provided impressions for 82.2% of the agents they encountered.
For a quantitative perspective on the data, we conduct sentiment analysis using VADER (Valence Aware Dictionary for Sentiment Reasoning) [40].Echoing the correspondence between warmth and stated preferences, the warmer participants perceived a co-player throughout the study, the more positively they tended to describe that co-player, β = 0.13, 95% CI [0.09, 0.16], p < 0.001 (Figure 8).In contrast, competence did not exhibit a significant relationship with sentiment, p = 0.24.Warmth evaluations, but not competence evaluations, correlated positively with the sentiment of participants' impressions toward their co-players (H3).
Anecdotally, participants expressed a wide range of emotions while describing their co-players.The θ = 45 • agents often evoked contrition and guilt: -"The red player seemed almost too cautious in going after coins which worked for me but made them seem easy to pick on, even though I wouldn't do that."-"I think I remember red being too nice during the game.It made me feel bad so I tried not to take many points from them." -"This one wasn't very smart and I stole some of their coins because it was easy.I feel kind of bad.It moved so erratically."Fig. 8: Relationship between social perception and impression sentiment in Study 1.The sentiment that participants expressed toward different co-players correlated with (a) their evaluations of warmth, p < 0.001, but not with (b) their judgments of competence, p = 0.24.Error bands represent 95% confidence intervals.
Participants discussed the θ = 0 • agents, on the other hand, with anger and frustration: -"Very aggressive play-style.Almost felt like he was taunting me.Very annoying."-"They seemed very hostile and really just wanting to gain the most points possible."-"I felt anger and hatred towards this green character.I felt like downloading the code for this program and erasing this character from the game I disliked them so much.They were being hateful and mean to me, when we both could have benefited by collecting our own colors."

Study 2
Our second study tested whether these effects and results remained robust when participants played Coins under a shifted incentive structure.The alternative structure increased the rewards for coin collections so that players cannot receive negative rewards (Table 2).As expected, this shift resulted in participants earn-  Overall, the perceptual and preference patterns from Study 2 replicated under the alternative incentive structure.As before, participants' warmth and competence evaluations display satisfactory psychometric properties.Participants' judgments varied significantly depending on the trait in question, F 3,4650 = 88.5, p < 0.001.At the same time, participants rated individual agents consistently for each given trait (Table 6).The composite measures show high scale reliability, with ρ = 0.92 for the composite warmth measure and ρ = 0.91 for the composite competence measure.
Stated preferences.We again fit fractional response regressions to understand the relationship between objective metrics, perceptions, and subjective preferences.The model with co-player identities as predictors captured a large amount of variance in stated preferences (Table 7, top row).Participants reported a distinct pattern of preferences across the agents (Figure A23).In pairwise comparison, participants favored the θ = 45 • agents over the θ = 0 • agents, and the θ = 0 • , ϵ = 0.5 agent over the θ = 0 • , ϵ = 0 agent.The model with participant score as the sole predictor performed considerably worse than it did in Study 1 (Table 7, middle row).Still, it captured the same pattern as before: the higher a participant scored with co-player A relative to co-player B, the greater their preferences for co-player A, OR = 1.06, 95% CI [1.06, 1.07], p < 0.001.The difference in participants' judgments of warmth for co-players A and B exhibits a significant relationship with their stated preference for co-player A over co-player B, p < 0.001.Competence evaluations similarly significantly contribute to preference predictions, p < 0.001.(a) and (b) depict odds ratios and preference predictions, respectively, from a fractional-response regression.Error bars and bands reflect 95% confidence intervals.
Participants' perceptions again serve as a better foundation for preference predictions than either game score or the identity of the specific algorithms they encountered (H1; Table 7, bottom row, and Figure 10a).The warmer a participant perceived co-player A relative to co-player B, the more they reported preferring co-player A, OR = 2.63, 95% CI [2.42, 2.89], p < 0.001 (Figure 10b).The negative relationship between competence and preferences appeared again: the more competent co-player A appeared relative to co-player B, the less participants tended to favor co-player A, OR = 0.81, 95% CI [0.76, 0.88], p < 0.001.
We next fit a joint regression with perceived warmth and competence as predictors, controlling for score.In this model, score significantly and positively correlates with stated preferences, p < 0.001 (Figure A24).As in Study 1, warmth and competence judgments remain significant predictors of participants' preferences, with p < 0.001 and p = 0.001, respectively.Once again, perceived warmth demonstrates an effect on stated preferences that exceeds the contributions of score and perceived competence.Social perception enhanced model fit above and beyond that provided by score on its own (H2).
Impression sentiment.At the end of the study, participants recalled 77.3% of their co-players well enough to describe their impressions through written responses.Again, the warmer participants perceived a co-player throughout the study, the more positively they tended to describe that co-player, β = 0.12, 95% CI [0.09, 0.15], p < 0.001 (Figure 11a).Breaking from the prior study, perceptions of competence exhibited a similar effect on post-game impression sentiment: the more competent an agent seemed, the more positively participants described them, β = 0.04, 95% CI [0.00, 0.08], p = 0.037 (Figure 11b).Both warmth and competence judgments positively correlate with the sentiment expressed in participants' impressions of the agents (H3).

Study 3
Our final study tested whether the relationship between social perceptions and subjective preferences translates to a revealed-preference setting.Does social perception continue to predict preferences when individuals face a partner choice?Social perception.As in the previous two studies, the composite warmth and competence measures exhibit high scale reliability, with ρ = 0.85 for the composite warmth measure and ρ = 0.86 for the composite competence measure.Agents prompted distinct warmth and competence profiles depending on their parameterization, just as seen in Studies 1 and 2 (Figure 12).Participants perceived high-SVO agents as significantly warmer than low-SVO agents, F 1,297 = 103.4,p < 0.001 (Figure A25a).Similarly, steady agents came across as significantly more competent than trembling agents, F 1,297 = 35.3,p < 0.001 (Figure A25b).Revealed preferences.To compare the performance of social perception against objective metrics, we fit three logistic regressions predicting participants' (binary) partner choice.We evaluated these models via AIC and Nagelkerke's R 2 [64].
Co-player identities 372.Participants reported a clear pattern of preferences across the agents (Table 8, top row).On expectation, participants favored the θ = 45 • agents over the θ = 0 • agents, and the θ = 0 • , ϵ = 0.5 agent over the θ = 0 • , ϵ = 0 agent.The model with participant score as the sole predictor fared somewhat worse at predicting preferences (Table 8, middle row).All the same, the pattern from Studies 1 and 2 replicated in Study 3: the higher a participant scored with co-player A relative to co-player B, the greater their preferences for co-player A, OR = 1.06, 95% CI [1.03, 1.08], p < 0.001.
For a third time, social perception offers stronger predictiveness than do score or co-player identity (H1; Table 8, bottom row).The warmer a co-player appeared to participants, the more likely participants were to play another episode with them, OR = 2.10, 95% CI [1.69, 2.65], p < 0.001 (Figure 13b).There was no significant relationship between perceived competence and partner choice, p = 0.88.Odds ratio (a) Odds ratios from model predicting partner choice.We subsequently fit a regression using perceived warmth and competence as predictors and controlling for score.In this model, score significantly and positively predicts revealed preferences, p = 0.011 (Figure A27).The effect of perceived warmth on preferences remains significant, p < 0.001, whereas competence evaluations fails to significantly correlate with preferences, p = 0.44.Regardless, the independent effect of perceived warmth exceeded the contribution of score.Overall, social perception improved model fit above and beyond that provided by score alone (H2).
Impression sentiment.At the end of the study, participants recalled 94.3% of the agents they encountered well enough to provide their impressions in written descriptions.The warmer participants perceived a co-player, the more positively they tended to describe that co-player, β = 0.14, 95% CI [0.10, 0.18], p < 0.001 (Figure 14a).Despite the lack of correspondence between perceived competence and partner choice, perceptions of competence exhibited a similar effect on post-game impression sentiment: the more competent an agent seemed, the more positively participants described them, β = 0.04, 95% CI [0.00, 0.08], p = 0.041 (Figure 14b).Both dimensions of social perception correlated positively with the sentiment of participants' impressions toward their co-players (H3).

Summary
Overall, we find evidence in support of each of our initial hypotheses: H1.Social perception significantly predicted participants' preferences for different agents, as measured through both self-report and partner choice.Participants consistently favored agents that they perceived as warmer and, to a smaller extent, that they perceived as less competent.H2.The predictive power of perceived warmth and competence extended beyond the insight provided by agent performance.Social perception provided more accurate preference predictions than standard indicators of performance, including the amount of reward received and the specific identity of the agent involved in the interaction.H3.Social perception correlated positively with the sentiment expressed in participants' verbal impressions of the agents.Participants employed more positive language to discuss agents that they rated higher on warmth and on competence.
In summary, these three studies provide clear evidence linking perceived warmth and competence to human preferences for artificial agents, over and above objective indicators of agent performance.

Discussion
Our experiments demonstrate that artificial agents trained with deep reinforcement learning can cooperate and compete with humans in temporally and spatially extended mixed-motive games.Human interactants perceived varying levels of warmth and competence when interacting with agents.Objective features like game score predict humans' preferences over different agents.However, preference predictions substantially improve by taking into account people's social perceptions; success in an interaction is driven not just by its objective outcomes, but by its social dimensions, too.This holds true whether examining stated or revealed preferences.
Participants preferred warm agents over cold agents, as hypothesized, butunexpectedly-our sample favored incompetent agents over competent agents.These patterns offer potential support for the primacy of warmth judgments observed in interpersonal perception [2].On the other hand, they may also emerge from the particular algorithm and parameter values that we investigated.It would be interesting to train agents using a wider range of parameter values, testing the robustness of these patterns.Such studies could investigate potential compensation effects between agent warmth and competence (e.g., the tendency to perceive incompetent partners as exceptionally warm; [96]) and build a broader mapping from agent parameters to participants' perceptions and preferences.Are there agents that balance the influence of warmth and competence evaluations on preferences, or is the relative contribution of perceived warmth robust across settings?
Another possible explanation stems from the flexible content of "competence" judgments in mixed-motive games [93].Did the tutorial or the study instructions inadvertently emphasize the competitive elements of Coins?Our study design may have primed participants to be adversarial, and thus to view selfishness as competence.Follow-up research should investigate a more diverse range of incentive structures and tasks to explore the robustness of this pattern [60].
Our results reinforce the generality of warmth and competence.Perceptions of warmth and competence structure impressions of other humans [77], as well as impressions of non-human actors including animals [82], corporations [45], and robots [76,79].In combination with recent studies of human-agent interactions in consumer decision-making contexts [34,46] and the Prisoner's Dilemma [58], our experiments provide further evidence that warmth and competence organize perceptions of artificial intelligence.
Competitive games have long been a focal point for AI research [13,83,85,94].We follow recent calls to move AI research beyond competition and toward cooperation [18].Most interaction research on deep reinforcement learning focuses on pure common-interest games such as Overcooked [14,91] and Hanabi [87], where coordination remains the predominant challenge.Expanding into mixed-motive games like Coins opens up new challenges related to motive alignment and exploitability.For example, participants who played with (and exploited) altruistic agents expressed guilt and contrition.This echos findings that-in human-human interactions-exploiting high-warmth individuals prompts self-reproach [4].At the same time, it conflicts with recent work arguing that humans are "keen to exploit benevolent AI" [43].Understanding whether these affective patterns generalize to a wider range of mixed-motive environments will be an important next step, particularly given the frequency with which people face mixed-motive interactions in their day-to-day lives [16,18].Human-agent interaction research should continue to explore these issues.
Preference elicitation is a vital addition to interactive applications of deep reinforcement learning.Incentivized partner choices can help test whether new algorithms represent innovations people would be motivated to adopt.Though self report can introduce a risk of experimenter demand, we also find a close correspondence between stated and revealed preferences, suggesting that the preferences individuals self-report in interactions with agents are not entirely "cheap talk" [24].Stated preferences thus represent a low-cost addition to studies that can still strengthen interaction research over sole reliance on objective measures of performance or accuracy.Overall, preference elicitation may prove especially important in contexts where objective metrics for performance are poorly defined or otherwise inadequate (e.g., [74]).In a similar vein, subjective preferences may serve as a valuable objective for optimization.Deep learning researchers have recently begun exploring approaches of this kind.For example, some scientists attribute the recent success of large language models, including the popular system ChatGPT [80], to their use of "reinforcement learning from human feedback" (RLHF) methods.Given a pre-trained model, RLHF applies reinforcement learning to fine-tune the final layers of the model, optimizing for reward calculated from simulated human preferences.Of course, these optimization methods carry their own risks.As recognized by Charles Goodhart and Marilyn Strathern, "when a measure becomes a target, it ceases to be a good measure" [35,90].Future studies can investigate the viability of such approaches.Nonetheless, preferences are not a panacea.Measuring subjective preferences can help focus algorithmic development on people's direct experience with agents, but does not solve the fundamental problem of value alignment-the "question of how to ensure that AI systems are properly aligned with human values and how to guarantee that AI technology remains properly amenable to human control" [32].In his extensive discussion of value alignment, Gabriel [32] identifies shortcomings with both "objective" metrics and subjective preferences as possible foundations for alignment.Developers should continue to engage with ethicists and social scientists to better understand how to align AI with values like autonomy, cooperation, and trustworthiness.

A Task details
This appendix provides detailed information on the implementation and parameterization of the Coins task.
We implement Coins using the DeepMind Lab2D platform [7].Table A1 details the task settings used for agent training, as well as for the tutorial and co-play episodes in the humanagent interaction studies.Figure A2 depicts the 11 × 11 room used for all co-play episodes in the interaction studies.
Following common convention in reinforcement learning research, this paper uses "step" to refer to the minimal, discrete unit of time within the environment.During each step, each player may take an action and receive rewards, and the environment can transition from one state to another.Table A1: Parameters for the Coins task in agent training and human-agent cooperation studies.*In Study 3, co-play episodes involve either n = 1 or n = 2 players, depending on whether the participant chose to play alone or with the co-player from the prior episode, respectively.**Tutorial episodes terminated after participants collected five coins or after T = 1500 steps (5 minutes).

A.1.1 Players and coins
The two primary entities in the Coins task are players and coins (Figure A1a).We use a colorblind-friendly palette for the player and coin sprites in our experiments.Each episode of Coins involves two distinct colors sampled from the five available colors (e.g., red and blue).Each player included in an episode matches one of these two colors.On each step, coins spawn on empty cells and cells occupied by players with probability P .When a coin spawns, it appears as one of the two colors, each with 50% probability.
Players can enter empty cells and cells with coins in them.Each player blocks other players from entering the cell they currently occupy.

A.1.2 Miscellaneous
Walls (Figure A1b, top row) block players from moving into a cell, and thus define the boundaries and dimensions of the room.Otherwise, players are able to freely move through empty cells (Figure A1b, bottom row) in the room.

A.2 Observations
DeepMind Lab2D supports two different visual frames of reference [47] for players: 1. Egocentric, meaning that players always see their avatar in an invariant position (e.g., centered) in their visual input 2. Allocentric, meaning that the world itself remains in an invariant position (e.g., centered) in players' visual input Egocentric frames of reference promote generalization for reinforcement learning agents [38,95].Consequently, we provide egocentric observations for our agents (Figure A3a).The agent observation spans five cells in each direction (forward, backward, left, and right) from the agent.Given the sprite dimensions (8×8×3, with the last dimension reflecting RGB channels), agents observed their environment through a 88 × 88 × 3 window.
In pilot tests of our study interface, human players reported being confused by the changes in their visual observation when playing with an egocentric reference frame.In combination with prior evidence that allocentric reference frames can support human navigation in virtual environments [19], this feedback prompted us to provide human players with an allocentric reference frame for the environment (Figure A3b).As human players move their avatar around the Coins gridworld, the room stays centered within their visual input while their avatar moves around the room.

A.3 Actions
Players can take one of the following five actions each step:

B Agent details
This appendix provides detailed information on the architecture, parameterization, and training of the artificial agents that we study.

B.1 Agent architecture
We build an advantage actor-critic (A2C) agent with two added algorithmic components (Figure A4).The Social Value Orientation (SVO) component-integrated for training, evaluation, and co-play-recomputes the environment reward signal received by the agent before its use by the critic.As detailed in Section B.2, the θ parameter guides this computation.The trembling-hand component-incorporated for evaluation and co-play-sits between the actor and the environment, replacing each action emitted by the policy with a random action with probability ϵ.

B.2 Social Value Orientation
McKee et al. [59] introduce and define the SVO algorithmic component across three descriptive levels [37,57] pseudoreward to SVO agents based on the divergence between their target reward angle and the realized reward angle, in combination with a weight parameter w.Eq. ( 3) in [59] defines this implementation.
Here we introduce a new implementation of SVO that retains the computational and algorithmic levels as described above, but replaces the divergence-penalization method with an approach based on vector projection (Figure A4).Within the Markov game framework described in [59], we define the vector-projection approach with a function U i to be maximized by agent i: In the Markov game framework, s is the current environmental state; o i is the observation received by agent i; a i is the action selected by agent i through a policy π(a i |o i ); r i is the environmental reward received by agent i, contained within the state reward vector ⃗ R = (r 1 , . . ., rn) for all n agents in the environment; θ is the Social Value Orientation for player i (representing player i's target distribution of reward among group members); and r−i is a statistic summarizing the rewards of all other group members from ⃗ R. For these experiments, we choose to define r−i as the arithmetic mean: r−i = 1 n−1 j̸ =i r j , where n is the group size.This formulation of SVO parallels the model of human preferences described by Griesinger and Livingston [36].
This utility function leaves most of its inputs unused, aside from information contained in the group reward vector ⃗ R. Consequently, the vector-projection approach may also be referenced as U i ( ⃗ R), as in Figure A4.

B.3 Training protocol
Agents trained in a distributed set of environments running in parallel.Each agent trained using one GPU, for a total of 5e7 environment steps across 150 environments.Our distributed framework checkpointed agents every 15 minutes during training.In the Markov game framework, agents select actions to take on the basis of observations and rewards they receive from the environment.We construct an advantage actor-critic (A2C) agent with two algorithmic modifications: a Social Value Orientation (SVO) component that takes the group reward vector ⃗ R as input and computes utility U i for the critic's use; and a "trembling hand" component that changes the actor's output a i with some probability ϵ, producing a modified action âi .The agent otherwise operates as usual, with the actor initially selecting an action given the agent's observation o i , and the critic computing an advantage estimate A from the agent's observation o i , its utility U i , and the agent's modified action âi .For more information on A2C, see [62].(b) The SVO component computes U i through vector projection.On the left, the black dot represents the reward vector given by the environment and defined by r i and r−i .The dotted blue line represents a vector along the agent's SVO angle.On the right, U i is the magnitude of the blue vector computed by projecting the environmental reward vector on the dotted blue line.See Eq. 1 for precise formulation.

C Human-agent interaction studies
This appendix provides detailed information on the design of the human-agent interaction studies and on the statistical analyses conducted in each study.

C.1 Study details
All three studies received a favorable opinion from the Human Behavioural Research Ethics Committee at DeepMind (#19/004) and were approved by the Institutional Review Board for Human Subjects at Princeton University (#11885).
Each study applied the same inclusion criteria during recruitment on Prolific: residence in the United States, prior completion of at least 20 studies on Prolific, and an approval rating of 95% or more.
We include screenshots showing how Study 1 unfolded for each participant: 1. Read general instructions on the study and gameplay (Figure A8). 2. Play practice episode (Figure A9).
3. Read instructions on game rules and co-players (Figures A10 and A11).4. Play co-play episode with Co-player A and answer questions about perceptions (Figure A12). 5. Play co-play episode with Co-player B and answer questions about perceptions and preferences (Figure A13).6. Repeat steps 3 and 4 for another 10 episodes.7. Transition to post-task questionnaire (Figure A14). Figure A15 shows an example open-ended impression question from the post-task questionnaire.
During gameplay, a "ticker" at the top of the screen displayed the three most recent coin collection events.The study interface did not provide participants with cumulative metrics (e.g., score or coin collections).
In all three studies, the practice episode instructed participants to collect five coins.The practice episode ended after the participant collected five coins or five minutes elapsed, whichever occurred first.
In Studies 1 and 2, we generated the sequence of co-players for each participant by listing the 12 possible pairwise combinations of agents, randomly shuffling the order of the pairs, and then randomly shuffling the order of the agents within each pair.
Study 2 closely resembled Study 1, aside from identifying the bonus earned per point as $0.02 (Figures A10a and A11h) and using the reward values from the shifted incentive structure (Table 2) when explaining the rules for matching and mismatching coin collections (Figures A11b-A11e).
Study 3 largely preserved the instructions preceding the first co-play episode (Figures A8-A11), though the wording on certain screens changed slightly to refer to a single upcoming episode rather than multiple episodes (e.g., "the next round" rather than "each round").The study branched more substantially starting just before the first co-play episode, as shown in Figures A16 and A17.After playing a single co-play episode, participants read a description of their partner choice for the following episode.They played the partner-choice episode as they chose (that is, either alone or with the co-player from the previous episode).

C.2 Analytic details
For our statistical analyses, we rely on several regression methods to help examine our data.In situations where our data consist of independent observations, we employ standard linear and logistic regressions to analyze the relationships between variables.For analyses involving repeated measurements collected from each participant, we leverage generalized linear mixedeffect models (GLMMs).These models offer flexibility to model different types of outcome measures (e.g., preferences reported on a bounded scale) and account for non-independent observations.pairwise preferences, OR = 2.23, 95% CI [2.08, 2.40], p < 0.001.The difference in the perceived competence of the agents significantly altered stated pairwise preferences, OR = 0.78, 95% CI [0.73, 0.84], p < 0.001.The effect of the joint intercept was not significant, p = 0.23.The model achieves an AIC of 1610.2 and R 2 m = 0.435.A final GLMM directly compared the predictiveness of social perceptions against that of the score that participants received.For this model, we standardized the predictors, allowing direct comparison between the model coefficients.The GLMM predicted pairwise preferences from: a joint intercept; standardized variables for perceived warmth, perceived competence, and score; and a random intercept for participant.All three fixed effects were significant within this model (Figure A20).One standard deviation of change in the warmth variable (difference in perceived warmth between two agents) had the largest standardized effect on pairwise preferences, OR = 3.02, 95% CI [2.54, 3.60], p < 0.001.One standard deviation in score had the next largest effect, OR = 2.13, 95% CI [1.82, 2.50], p < 0.001.One standard deviation in the competence variable (difference in perceived competence between two agents) had the effect with the smallest magnitude, OR = 0.85, 95% CI [0.75, 0.96], p = 0.012.The effect of the joint intercept was significant, OR = 1.18, 95% CI [1.03, 1.36], p = 0.012.
Finally, a linear model predicted post-game impression sentiment from an intercept, perceived warmth, and perceived competence.Perceived warmth significantly correlated with impression sentiment, β = 0.13, 95% CI [0.09, 0.16], p < 0.001.Perceived competence did not exhibit a significant relationship with sentiment, p = 0.24.The effect of the intercept was not significant, p = 0.52.

C.2.2 Study 2
As an initial test, we sought to validate the effect of shifting the incentive structure on participants' rewards.A linear mixed-effects model predicted participant reward from a categorical variable representing the study number, with Study 1 as the reference level.As expected, the study variable exerted a significant effect on participant reward, β = 27.3,95% CI [26.5, 28.0], p < 0.001.
As in Study 1, we expect that the social perception questions will exhibit suitable psychometric properties.Table 6 shows the high consistency with which participants evaluated a given agent on a given trait.A mixed ANOVA showed that social perceptions varied more between traits than within a particular trait, F 4,4650 = 88.5, p < 0.001.Finally, both the composite warmth measure (ρ = 0.92) and the composite competence measure (ρ = 0.91) exhibit high internal consistency, as measured by the Spearman-Brown formula.
A sequence of fractional response models assessed the predictive value of the hypothesized predictors for participants' stated pairwise preferences.We construct each model through a GLMM with a binomial link function, normalizing the full preference scale with zero and one as its endpoints.To ensure model comparability, all fractional response models predict the exact same outcome variable.
The first GLMM predicted stated preferences from a joint intercept, a fixed effect for the identities of the agents being compared, and a random effect for the participant.The underlying identities of the agents significantly affected the participant's stated pairwise preferences, χ 2 (11) = 549.7,p < 0.001.Figure A23 shows mean pairwise preferences calculated for each pair of agent identities.The effect of the joint intercept was not significant, p = 0.15.The model achieves an AIC of 1608.7 and R 2 m = 0.403.The second GLMM predicted stated preferences from a joint intercept, a fixed effect for the difference in scores the participant received when playing with each of the agents being compared, and a random effect for the participant.The difference in scores earned by the participant significantly influenced stated pairwise preferences, OR = 1.06, 95% CI [1.06, 1.07], p < 0.001.The effect of the joint intercept was not significant, p = 0.81.The model achieves an AIC of 2049.4 and R 2 m = 0.214.The third GLMM predicted stated preferences from a joint intercept, a fixed effect for the difference in perceived warmth for each of the agents being compared, a fixed effect for the difference in perceived competence for each of the agents, and a random effect for the participant.The difference in the perceived warmth of the agents significantly altered stated pairwise preferences, OR = 2.63, 95% CI [2.42, 2.89], p < 0.001.The difference in the perceived competence of the agents significantly affected stated pairwise preferences, OR = 0.82, 95% CI [0.76, 0.88], p < 0.001.The effect of the joint intercept was not significant, p = 0.99.The model achieves an AIC of 1510.4 and R 2 m = 0.496.A final GLMM directly compared the predictiveness of social perceptions against that of the score that participants received.For this model, we standardized the predictors, allowing direct comparison between the model coefficients.The GLMM predicted pairwise preferences from: a joint intercept; standardized variables for perceived warmth, perceived competence, and score; and a random intercept for participant.All three fixed effects were significant within this model (Figure A24).One standard deviation of change in the warmth variable (difference in perceived warmth between two agents) had the largest standardized effect on pairwise preferences, OR = 5.56, 95% CI [4.74, 6.74], p < 0.001.One standard deviation in score had the next largest effect, OR = 1.50, 95% CI [1.31, 1.71], p < 0.001.One standard deviation in the competence variable (difference in perceived competence between two agents) had the effect with the smallest magnitude, OR = 0.80, 95% CI [0.70, 0.91], p < 0.001.The effect of the joint intercept was not significant, p = 0.49.

C.2.3 Study 3
In Study 3, each participant interacted with a single agent, and thus only provided a single measurement point for warmth judgments and competence judgments.As a result, the ANOVAs and regressions reported below do not incorporate random effects for participants.
An ANOVA indicated that social perceptions varied more between traits than within a particular trait, F 3,1200 = 54.9, p < 0.001.Both the composite warmth measure (ρ = 0.85) and the composite competence measure (ρ = 0.86) exhibit high internal consistency, as measured by the Spearman-Brown formula.
A two-way ANOVA modeled the effects of θ, ϵ, and their interaction on warmth evaluations.Figure A25a depicts the main effect of θ on perceived warmth (that is, marginalized over ϵ values), and Figure A26a visualizes the full two-way interaction.The following means and standard deviations help describe main effects by marginalizing over all other variables.Participants perceived θ = 45 • agents (m = 3.14, sd = 1.22) as significantly warmer than θ = 0 • agents (m = 1.79, sd = 1.09),F 1,297 = 103.4,p < 0.001.The ϵ parameter did not exert a significant effect on warmth judgments, F 1,297 = 0.2, p = 0.62.Finally, the effect of the interaction between the θ and ϵ parameters was significant, F 1,297 = 5.3, p = 0.022.
A two-way ANOVA modeled the effects of θ, ϵ, and their interaction on competence judgments.Figure A25b depicts the main effect of ϵ on competence evaluations (that is, marginalized over θ values), and Figure A26b visualizes the full two-way interaction.The following means and standard deviations help describe main effects by marginalizing over all other variables.Participants judged ϵ = 0 agents (m = 3.75, sd = 1.15) as significantly more competent than ϵ = 0.5 agents (m = 2.95, sd = 1.23),F 1,297 = 35.3,p < 0.001.The θ parameter did not exert a significant effect on competence evaluations, F 1,297 = 3.8, p = 0.05.Finally, the effect of the interaction between the θ and ϵ parameters was not significant, F 1,297 = 1.2, p = 0.26.
A sequence of logistic regressions evaluated the predictive value of the hypothesized predictors for participants' revealed preferences.
The first logistic regression predicted revealed preferences from an intercept and the identity of the agent.The underlying identity of the agent significantly affected the participant's revealed preferences, χ 2 (3) = 45.4,p < 0.001.The effect of the intercept on revealed preferences was significant, OR = 1.88, 95% CI [1.18, 3.07], p = 0.009.The model achieves an AIC of 372.5 and R 2 = 0.188.
The second logistic regression predicted revealed preferences from an intercept and the score earned by the participant when playing with the agent.The score earned by the participant exerted a significant influence on revealed preferences, OR = 1.06, 95% CI [1.03, 1.08], p < 0.001.The effect of the intercept was also significant, OR = 0.16, 95% CI [0.08, 0.31], p < 0.001.The model achieves an AIC of 390.5 and R 2 = 0.101.
A final logistic regression directly compared the predictiveness of social perceptions against that of the score that participants received.For this model, we standardized the predictors, allowing direct comparison between the model coefficients.The model predicted revealed preferences from an intercept and standardized variables for perceived warmth, perceived competence, and score (Figure A27).The effects of perceived warmth and of score were significant within the model.One standard deviation of change in perceived warmth had the larger effect on revealed preferences, OR = 2.27, 95% CI [1.65, 3.16], p < 0.001.One standard deviation in score had a smaller effect, OR = 1.47, 95% CI [1.09, 1.98], p = 0.011.The effect of perceived competence was not significant, p = 0.44.The effect of the intercept was significant, OR = 1.18, 95% CI [1.03, 1.36], p = 0.006.

Fig. 1 :
Fig. 1: When humans encounter a new agent, they automatically and rapidly form an impression of the agent (social perception).The human can leverage their impression to decide whether to continue or discontinue the interaction (partner choice).

Fig. 2 :
Fig. 2: Screenshot of gameplay in Coins.Two players move around a small room and collect colored coins.Coins randomly appear in the room over time.Players receive reward from coin collections depending on the match or mismatch between their color and the coin color.

1: Play alone 2 :
preferences.Would you like to play the next round alone, or with the same co-player?Play with the same co-player (c) Revealed preferences.

Fig. 4 :Figure 4
Fig.4: Performance metrics over agent training.Selfish agents quickly learned to collect coins, but did not learn to avoid mismatches.As a result, collective return hovered around zero.Prosocial agents exhibited slower learning and collected fewer coins on average, but also learned to avoid mismatching coins.As a result, collective return increased markedly over training.Error bands represent 95% confidence intervals over 100 evaluation episodes run at regular training checkpoints.3Results3.1 Agent trainingFigure4displays coin collections and score over the course of agent training.The training curves for θ = 0 • agents closely resemble those from previous studies[51]: selfish agents quickly learn to collect coins, but never discover the cooperative strategy of picking up only matching coins.As a result, collective return remains at zero throughout training.Prosocial (θ = 45 • ) agents, on the other hand, learn to avoid mismatching coins, substantially increasing their scores over the course of training.We evaluate agents with ϵ ∈ {0, 0.25, 0.5, 0.75, 1} to understand the effect of the trembling-hand module on agent behavior (FiguresA5-A7).As expected, higher ϵ values degrade performance.Total coin collections decrease with increasing ϵ for both selfish and prosocial agents.Higher levels of ϵ cause prosocial agents to become less discerning at avoiding mismatching coins, and consequently produce lower levels of collective return.
Effect of perceived warmth on stated preferences.

Fig. 7 :
Fig.7: Relationship between social perception and subjective preferences in Study 1.The difference in participants' evaluations of the warmth of co-player A over coplayer B significantly correlates with their stated relative preference for co-player A, p < 0.001.Perceived competence exhibits a similar (significant) relationship with preferences, p < 0.001.(a) and (b) depict odds ratios and preference predictions, respectively, from a fractional-response regression.Error bars and bands represent 95% confidence intervals.
Effect of perceived warmth on post-game impression sentiment.
Effect of perceived competence on post-game impression sentiment.
Effect of perceived warmth on stated preferences.

Fig. 10 :
Fig.10: Relationship between social perception and subjective preferences in Study 2. The difference in participants' judgments of warmth for co-players A and B exhibits a significant relationship with their stated preference for co-player A over co-player B, p < 0.001.Competence evaluations similarly significantly contribute to preference predictions, p < 0.001.(a) and (b) depict odds ratios and preference predictions, respectively, from a fractional-response regression.Error bars and bands reflect 95% confidence intervals.
Perceived warmth and post-game impression sentiment.
Perceived competence and post-game impression sentiment.

Fig. 11 :
Fig. 11: Relationship between social perception and impression sentiment in Study 2. The sentiment in impressions of different co-players correlated with participants' evaluations of both (a) warmth, p < 0.001, and (b) competence, p = 0.037.Error bands indicate 95% confidence intervals.
Effect of perceived warmth on partner choice.

Fig. 13 :
Fig. 13: Relationship between social perception and subjective preferences in Study 3, as modeled through logistic regression.Participants' perceptions of warmth demonstrate a significant relationship with revealed preferences for co-players, p < 0.001.Competence judgments did not significantly correlate with revealed preferences, p = 0.44.(a) and (b) depict odds ratios and preference predictions, respectively, from a logistic regression.Error bars and bands indicate 95% confidence intervals.
Perceived warmth and postgame impression sentiment.
Perceived competence and post-game impression sentiment.

Fig. 14 :
Fig. 14: Relationship between social perception and impression sentiment in Study 3. The sentiment in participants' impressions of their different co-players correlated with their perceptions of both (a) warmth, p < 0.001, and (b) competence, p = 0.041.Error bands reflect 95% confidence intervals.

Fig. A1 :
Fig. A1: Entities and sprites used in the Coins task.

1 . 2 . 3 .
No-op: Makes no change to the player's position.Move up: Moves the player up one cell.Move down: Moves the player down one cell.4. Move left: Moves the player left one cell.5. Move right: Moves the player right one cell.

Fig
Fig. A2: A 11 × 11 room, as used in all co-play episodes in human-agent cooperation studies.

Fig. A3 :
Fig. A3: Observations for the Coins task, depicting different frames of reference for the same example game state.(a) Agent players observe Coins with an egocentric reference frame.(b) Human players observe Coins with an allocentric reference frame.
Vector projection and SVO.

Fig. A4 :
Fig.A4: Agent details.(a) In the Markov game framework, agents select actions to take on the basis of observations and rewards they receive from the environment.We construct an advantage actor-critic (A2C) agent with two algorithmic modifications: a Social Value Orientation (SVO) component that takes the group reward vector ⃗ R as input and computes utility U i for the critic's use; and a "trembling hand" component that changes the actor's output a i with some probability ϵ, producing a modified action âi .The agent otherwise operates as usual, with the actor initially selecting an action given the agent's observation o i , and the critic computing an advantage estimate A from the agent's observation o i , its utility U i , and the agent's modified action âi .For more information on A2C, see[62].(b) The SVO component computes U i through vector projection.On the left, the black dot represents the reward vector given by the environment and defined by r i and r−i .The dotted blue line represents a vector along the agent's SVO angle.On the right, U i is the magnitude of the blue vector computed by projecting the environmental reward vector on the dotted blue line.See Eq. 1 for precise formulation.

Fig. A12 :
Fig.A12: Screenshots of an "A" co-play episode and subsequent questions on perceptions in Study 1. (e) and (f) show the screens for eliciting and confirming one judgment; the study repeats these screens for all four judgments, randomizing the sequence of judgments for each episode.

Fig. A13 :
Fig. A13: Screenshots of a "B" co-play episode and subsequent questions on perceptions and preferences in Study 1. (e) and (f) show the screens for eliciting and confirming one judgment; the study repeats these screens for all four judgments, randomizing the sequence of judgments for each episode.
(a) Screen 37: Confirm bonus and transition to post-task questionnaire.

Fig. A15 :
Fig. A15: Screenshot of open-ended impression question from post-task questionnaire in Study 1.

Fig. A16 :
Fig. A16: Screenshots of instructions, co-play episode, and subsequent questions on perceptions in Study 3. (g) and (h) show the screens for eliciting and confirming one judgment; the study repeats these screens for all four judgments, randomizing the sequence of judgments for each episode.

Fig. A17 :
Fig. A17: Screenshots of instructions, partner choice question, and partner-choice episode in Study 3.

Fig. A20 :
Fig. A20: Odds ratios in Study 1. Asterisks indicate that predictors are centered and normalized by one standard deviation to permit fair comparison between their effect sizes.

Fig. A21 :
Fig. A21: Main effects of the SVO and trembling-hand components on social perceptions in Study 2. (a) An agent's SVO parameter significantly affected warmth perceptions, p < 0.001.(b) The trembling-hand component significantly influenced competence evaluations, p < 0.001.Error bars reflect 95% confidence intervals.

Table 1 :
Canonical incentive structure for Coins.

Table 2 :
Alternative incentive structure for Coins.

Table 5 :
Metrics for fractional response models predicting preferences in Study 1. Lower values of AIC and higher values of R 2 m indicate stronger fits.

Table 6 :
Participants' evaluations of their co-players were highly consistent in Study 2, as assessed by ICC.Higher values of ICC indicate greater consistency.

Table 7 :
Metrics for fractional response models predicting preferences in Study 2. Lower values of AIC and higher values of R 2 m indicate stronger fits.

Table 8 :
Metrics for logistic models predicting partner choice in Study 3. Lower values of AIC and higher values of R 2 indicate stronger fits.

: 1 .
At the computational level, McKee et al. propose SVO as a mechanism "redefin[ing] selfinterest to incorporate the interests of a broader group".2. At the algorithmic level, McKee et al. introduce "reward angles" as a method of representing distributions of reward over the self and the group.3.At the implementation level, McKee et al. offer a penalty-based approach that assigns