1 Introduction

Trust is central to the development and deployment of artificial intelligence (AI) [41, 89]. However, many members of the public harbor doubts and concerns about the trustworthiness of AI [15, 21, 25, 44]. This presents a pressing issue for cooperation between humans and AI agents [18].

Algorithmic development research has been slow to recognize the importance of trust and preferences for cooperative agents. Recent studies show that deep reinforcement learning can be used to train interactive agents for human-agent collaboration [14, 55, 91, 92]. The “human-compatible” agents from these experiments demonstrate compelling improvements in game score, task accuracy, and win rate over established benchmarks. However, a narrow focus on “objective” metrics of performance obscures any differences in subjective preferences humans develop over cooperative agents. Two agents may generate similar benefits in terms of typical performance metrics, but human teammates may nonetheless express a strong preference for one over the other [87, 91]. Developing human-compatible, cooperative agents will require evaluating agents on dimensions other than objective performance.

Fig. 1
figure 1

When humans encounter a new agent, they automatically and rapidly form an impression of the agent (social perception). The human can leverage their impression to decide whether to continue or discontinue the interaction (partner choice)

What shapes subjective preferences for artificial agents, if not a direct mapping of agent performance? One possible source of variance is social perception. When encountering a novel actor, humans rapidly and automatically evaluate the actor along two underlying dimensions: warmth and competence [1, 28, 29, 58]. These perceptions help individuals “make sense of [other actors] in order to guide their own actions and interactions” [27] (Fig. 1). The competence dimension aligns with the established focus on performance and score in machine learning research [10]: How effectively can this actor achieve its interests? Appraising an actor’s warmth, on the other hand, raises a novel set of considerations: How aligned are this actor’s goals and interests with one’s own? Research on social cognition consistently demonstrates that humans prefer others who are not only competent, but also warm [1, 30]. Hence, we predict that perceived warmth will be an important determinant of preferences for artificial agents.

Here we run behavioral experiments to investigate social perception and subjective preferences in human-agent interaction. We train reinforcement learning agents to play Coins, a mixed-motive game, varying agent hyperparameters known to influence cooperative behavior and performance in social dilemmas. Three co-play experiments then recruit human participants to interact with the agents, measure participants’ judgments of agent warmth and competence, and elicit participant preferences over the agents.

Experiments evaluating human views of agents frequently rely on stated preferences, typically by directly asking participants which of two agents they preferred as a partner [22, 87, 91]. Such self-report methods can be insightful tools for research [68]. However, they exhibit limited ecological validity and are vulnerable to experimenter demand [72]. In this paper, we overcome these challenges by eliciting revealed preferences [78]: Do people even want to interact with a given agent, if given the choice not to? Partner choice, or the ability to leave or reject an interaction, is a well-established revealed-preference paradigm in evolutionary biology and behavioral economics [5, 6, 12, 88]. While studies that measure revealed preferences (e.g., [49, 73]) are not inherently immune to experimenter demand, partner-choice measures can mitigate demand effects when participants are compensated based on their performance (“incentivized choice”; see [72]). Partner choice also carries external validity for interaction research: in the context of algorithmic development, we can view partner choice as a stand-in for the choice to adopt an artificial intelligence system [8, 20, 67]. For instance, users may test out several suggestions from a recommender system before deciding whether or not to rely on it for future decisions. Similarly, commuters might tentatively try several rides from self-driving cars to help choose whether to transition away from traditional driving. Overall, partner-choice study designs empower participants with an ability to embrace or leave an interaction with an agent—and thus incorporate an ethic of autonomy [9, 41, 61] into human-agent interaction research.

In summary, this paper makes the following contributions to cooperative AI research:

  1. (1)

    Demonstrates the use of reinforcement learning to train human-compatible agents for a temporally and spatially extended mixed-motive game.

  2. (2)

    Connects human-AI interaction research to frameworks from psychology and economics, identifying tools that researchers can easily import for their own studies.

  3. (3)

    Illustrates how social perceptions affect stated and revealed preferences in incentivized interactions with agents, above and beyond standard performance metrics.

2 Methods

2.1 Task

Coins [31, 33, 51, 71] is a mixed-motive Markov game [54] played by \(n = 2\) players (Fig. 2; see also Appendix A for full task details). Players of two different colors occupy a small gridworld room with width w and depth d. Coins randomly appear in the room throughout the game, spawning on each cell with probability P. Every coin matches one of the two players in color. On each step of the game, players can stand still or move around the room.

The goal of the game is to earn reward by collecting coins. Players pick up coins by stepping onto them. Each coin collection generates reward for both players as a function of the match or mismatch between the coin color and the collecting player’s color. Under the canonical rules (Table 1), a player receives \(+1\) reward for picking up a coin of any color. If a player collects a coin of their own color (i.e., a matching coin), the other player is unaffected. However, if a player picks up a coin of the other player’s color (i.e., a mismatching coin), the other player receives \(-2\) reward. In the short term, it is always individually advantageous to collect an available coin, whether matching or mismatching. However, this strategy will lower the score of the other player. Players achieve the socially optimal outcome by collecting only the coins that match their color.

Table 1 Canonical incentive structure for coins
Fig. 2
figure 2

Screenshot of gameplay in Coins. Two players move around a small room and collect colored coins. Coins randomly appear in the room over time. Players receive reward from coin collections depending on the match or mismatch between their color and the coin color

Two properties make Coins an ideal testbed for investigating perceptions of warmth and competence. First, as a consequence of its incentive structure, Coins is a social dilemma [48]: players can pursue selfish goals or prosocial goals. Second, relative to matrix games like the Prisoner’s Dilemma, Coins is temporally and spatially extended [50]: players can employ a variety of strategies to achieve their goals, with differing levels of efficiency. We hypothesize that these two features offer sufficient affordance for an observer to infer other players’ intentions and their effectiveness at enacting their intentions [11, 75, 97].

Our experiments use a colorblind-friendly palette, with red, blue, yellow, green, and purple players and coins (Fig. A1a). During agent training, we procedurally generate rooms with width w and depth d independently sampled from \(\mathcal {U}\{10, 15\}\). Coins appear in each cell with probability \(P = 0.0005\). Episodes last for \(T = 500\) steps. Each episode of training randomly samples colors (without replacement) for agents.

In our human-agent interaction studies, co-play episodes use \(w = d = 11\), \(P = 0.0005\), and \(T = 300\). Player colors are randomized across the five players (one human participant and four agent co-players) at the beginning of each study session, and held constant across all episodes within the session.

In Study 1, humans and agents play Coins with the canonical rules. In Studies 2 and 3, humans and agents play Coins with a slightly altered incentive structure. Each outcome increases by \(+2\) reward, making all rewards in the game non-negative (Table 2). Since all rewards are offset by the same amount, this reward scheme preserves the social dilemma structure in Coins.

Table 2 Alternative incentive structure for coins

2.2 Agent design and training

We leverage deep reinforcement learning to train four agents for our human-agent cooperation studies (see Appendix B for full agent details). Overall, our study design and measurement tools are agnostic to the algorithmic implementation of the agents being evaluated. For this paper, the agents learn using the advantage actor-critic (A2C) algorithm [62]. The neural network consists of a convolutional module, a fully connected module, an LSTM with contrastive predictive coding [39, 66], and linear readouts for policy and value. Agents train for \(5 \times 10^7\) steps in self-play, with task parameters as described in Sect. 2.1. We consider two algorithmic modifications to the agents to induce variance in social perception.

First, we build the Social Value Orientation (SVO) component [59], an algorithmic module inspired by psychological models of human prosocial preferences [36, 52, 63], into our agents. The SVO component parameterizes each agent with \(\theta\), representing a target distribution over their reward and the reward of other agents in their environment. SVO agents are intrinsically motivated [86] to optimize for task rewards that align with their parameterized target \(\theta\). For these experiments, we endow agents with the “individualistic” value \(\theta = 0^{\circ }\) and the “prosocial” value \(\theta = 45^{\circ }\).

Second, we add a “trembling hand” [17, 81] component to the agents for evaluation and co-play. With probability \(\epsilon\), the trembling-hand module replaces each action selected by the agent with a random action. This component induces inefficiency in maximizing value according to an agent’s learned policy and value function. For these experiments, we apply the “steady” value \(\epsilon = 0\) and the “trembling” value \(\epsilon = 0.5\).

Table 3 Predictions for social perception (warmth and competence) as a function of agent hyperparameters (Social Value Orientation \({\theta }\) and trembling hand \({\epsilon }\))

Table 3 summarizes the hyperparameter values and predicted effects for the four evaluated agents.

2.3 Study design for human-agent studies

We recruited participants from Prolific [69, 70] for all studies (total \(N = 501\); 47.6% female, 48.4% male, 1.6% non-binary or trans; \(m_{\textrm{age}} = 33\), \(sd_{\textrm{age}} = 11\)). We received informed consent from all participants across the three studies. In each study, participants earned a base level of compensation for completing the study, with an additional bonus that varied as a function of their score in the task. Appendix C presents full details of our study design, including independent ethical review and study screenshots.

Overall, our studies sought to explore the relationship between social perception and subjective preferences in Coins. Study 1 approached these constructs using the canonical payoff structure for Coins [51] and an established self-report framework for eliciting (stated) preferences [91]. This study framework leverages a “within-participants design”, meaning that each participant interacts with all agents in the study. Within-participants designs increase the data points collected per participant and allow for better control of between-individual variance, thus improving statistical power for a given sample size.

We subsequently sought to understand whether the findings from Study 1 replicate under a partner choice framework. Does social perception exhibit the same predictive power for revealed preferences as it does for stated preferences? While within-participants designs offer several statistical advantages, they are not ideal for studying partner choices. Exposing participants to multiple potential partners can introduce order effects, where the exact sequence of interactions influences participants’ responses. Within-participants designs may also fatigue participants, potentially compromising the quality of their responses as the study progresses. Consequently, participants’ partner choices may be progressively less motivated by genuine preference and more influenced by extraneous factors. Thus, to study revealed preferences, we switched to a “between-participants design” in which each participant interacted with one randomly selected agent. Given that humans respond more strongly to losses than to commensurate gains [42], we made one additional change, testing participants’ partner choices under a shifted incentive structure with all non-negative outcomes (Table 2). To isolate the effects of the change from stated to revealed preferences, we approached these changes in two stages. Study 2 used the same stated-preference approach and within-participants design as Study 1, but incorporated the offset incentive structure. Study 3 then elicited revealed preferences in place of stated preferences.

We tested the following hypotheses in our studies:

H1

Social perception predicts participants’ stated and revealed preferences. That is, human participants will prefer to play with agents they perceive as warm and competent.

H2

Social perception predicts participants’ stated and revealed preferences, above and beyond the scores that participants receive. That is, participants’ social perceptions of agents will contribute to their preferences independently of the scores they receive when playing with the agents.

H3

Social perception will correlate positively with the sentiment of participants’ verbal impressions of the agents. That is, participants’ social perceptions of agents will emerge as positive sentiment in participants’ verbal descriptions of the agents.

2.3.1 Study 1

Fig. 3
figure 3

Questionnaires administered in the human-agent interaction studies

Our first study aimed to explore the relationship between social perception and stated preferences across the four agents. We recruited \(N = 101\) participants from Prolific (45.5% female, 51.5% male; \(m_{\textrm{age}} = 34\), \(sd_{\textrm{age}} = 13\)). The study employed a within-participants design: each participant encountered and played with the full cohort of co-players (i.e., all four agents).

At the beginning of the study, participants read instructions and played a short tutorial episode alone, without a co-player, in order to learn the game rules and payoff structure (Table 1). The study instructed participants that they would receive $0.10 for every point earned during the remaining episodes. Participants then played 12 episodes with a randomized sequence of agent co-players, generated by concatenating every possible combination of co-players. Each of these co-play episodes lasted \(T = 300\) steps (1 min). After every episode, participants rated how “warm”, “well-intentioned”, “competent”, and “intelligent” the co-player from that episode was on five-point Likert-type scales (see Fig. 3a). After every two episodes, participants reported their preference over the agent co-players from those episodes on a five-point Likert-type scale (see Fig. 3b). In the experiment interface, we referred to the first agent co-player in each two-episode sequence as “co-player A”. The interface similarly referred to the second agent in each two-episode sequence as “co-player B”. Because the sequence of co-players was produced by concatenating all co-player combinations, each participant stated their preferences for every possible pairing of co-players.

After playing all 12 episodes, participants completed a short post-task questionnaire. The questionnaire first solicited open-ended responses about each of the encountered co-players, then collected standard demographic information and open-ended feedback on the study overall. The study took 22.4 min on average to complete, with a compensation base of $2.50 and an average bonus of $7.43.

2.3.2 Study 2

Our second study tested the relationship between social perception and stated preferences under the shifted incentive structure for Coins (Table 2). We recruited \(N = 99\) participants from Prolific (38.4% female, 55.6% male; \(m_{\textrm{age}} = 34\), \(sd_{\textrm{age}} = 12\)). The study employed the same within-participants design as Study 1, with one primary change: participants and agents played Coins under the shifted incentive structure. To keep bonus payments comparable to Study 1, we adjusted the bonus rate in Study 2. Participants received $0.02 for each point earned during non-tutorial episodes.

As before, participants played 12 episodes with a randomized sequence of agent co-players, generated such that they rated and compared every possible combination of co-players. As in Study 1, the interface referred to the first agent co-player in each two-episode sequence as “co-player A” and to the second agent as “co-player B”. The study took 23.2 min on average to complete, with a compensation base of $2.50 and an average bonus of $6.77.

2.3.3 Study 3

Our final study assessed whether the predictiveness of social perception extends to a revealed-preference framework. We recruited \(N = 301\) participants from Prolific (51.3% female, 45.0% male, 1.7% non-binary; \(m_{\textrm{age}} = 33\), \(sd_{\textrm{age}} = 11\)). In contrast with the preceding studies, Study 3 employed a between-participants design: each participant interacted with a single, randomly sampled agent.

The majority of the study introduction remained the same as in Study 2, with some instructions altered to inform participants they would play Coins with a single co-player (as opposed to multiple co-players, like in Studies 1 and 2). After reading the instructions and playing a short tutorial episode alone, participants played one episode of Coins with a randomly sampled co-player. After this episode, participants rated how “warm”, “well-intentioned”, “competent”, and “intelligent” their co-player was on five-point Likert-type scales (see Fig. 3a). Participants subsequently learned that they would be playing one additional episode, with the choice of playing alone or playing with the same co-player. Participants indicated through a binary choice whether they wanted to play alone or with the co-player (see Fig. 3c). They proceeded with the episode as chosen, and then completed the standard post-task questionnaire.

The study took 6.2 min on average to complete, with a compensation base of $1.25 and an average bonus of $1.25.

3 Results

3.1 Agent training

Figure 4 displays coin collections and score over the course of agent training. The training curves for \(\theta = 0^{\circ }\) agents closely resemble those from previous studies [51]: selfish agents quickly learn to collect coins, but never discover the cooperative strategy of picking up only matching coins. As a result, collective return remains at zero throughout training. Prosocial (\(\theta = 45^{\circ }\)) agents, on the other hand, learn to avoid mismatching coins, substantially increasing their scores over the course of training.

Fig. 4
figure 4

Performance metrics over agent training. Selfish agents quickly learned to collect coins, but did not learn to avoid mismatches. As a result, collective return hovered around zero. Prosocial agents exhibited slower learning and collected fewer coins on average, but also learned to avoid mismatching coins. As a result, collective return increased markedly over training. Error bands represent 95% confidence intervals over 100 evaluation episodes run at regular training checkpoints

We evaluate agents with \(\epsilon \in \{ 0, 0.25, 0.5, 0.75, 1 \}\) to understand the effect of the trembling-hand module on agent behavior (Figs. A5–A7). As expected, higher \(\epsilon\) values degrade performance. Total coin collections decrease with increasing \(\epsilon\) for both selfish and prosocial agents. Higher levels of \(\epsilon\) cause prosocial agents to become less discerning at avoiding mismatching coins, and consequently produce lower levels of collective return.

3.2 Human-agent studies

In addition to the results and information presented here, Appendix C offers expanded explanations and full details of our statistical analyses.

3.2.1 Study 1

Participants played with each agent three times during the study, evaluating the relevant agent after each episode of play. Participants did not make judgments at random; their responses were highly consistent across their interactions with each agent (Table 4). At the same time, participants were not submitting vacuous ratings. Perceptions varied significantly as a function of which trait participants were evaluating, \(F_{3,4744} = 96.2\), \(p < 0.001\).

Table 4 Participants’ evaluations of their co-players were highly consistent, as assessed by intraclass correlation coefficient (ICC) [84]
Fig. 5
figure 5

Main effects of algorithmic components on social perceptions in Study 1. a An agent’s Social Value Orientation (SVO) significantly influenced perceived warmth, \({p < 0.001}\). b Similarly, the trembling-hand component significantly changed competence judgments, \({p < 0.001}\). Error bars indicate 95% confidence intervals

Psychology research often employs composite measures to assess cognitive constructs (attributes and variables that cannot be directly observed). Combining multiple individual measures into composite scales can reduce measurement error and provide a more stable estimate of the latent construct underlying the scale [53, 56]. Following standard practice in social perception research [30], we computed two composite measures for further analysis. A composite warmth measure averaged participants’ judgments of how “warm” and how “well-intentioned” their co-player was. A composite competence measure similarly combined individual judgments of how “competent” and “intelligent” each co-player was. Both composite measures exhibit high scale reliability as measured by the Spearman-Brown formula [23], with \(\rho = 0.93\) for the composite warmth measure and \(\rho = 0.92\) for the composite competence measure.

Social perception As expected, the SVO and trembling-hand algorithmic components generated markedly divergent appraisals of warmth and competence. Participants perceived high-SVO agents as significantly warmer than low-SVO agents, \(F_{1,1108} = 1006.8\), \(p < 0.001\) (Fig. 5a). Similarly, steady agents came across as significantly more competent than trembling agents, \(F_{1,1108} = 70.6\), \(p < 0.001\) (Fig. 5b). Jointly, the algorithmic effects prompted distinct impressions in the warmth-competence space (Fig. 6).

Fig. 6
figure 6

Overall pattern of perceived warmth and competence in Study 1. Error bars reflect 95% confidence intervals

Stated preferences How well do participants’ perceptions predict subjective preferences, relative to predictions made based on objective score? We fit competing fractional response models to assess the influence of score and social perception, respectively, on self-reported preferences. We then compared model fit using the Akaike information criterion (AIC) [3] and Nakagawa’s R2 [65]. We fit an additional baseline model using algorithm identities (i.e., which two agents participants were comparing) as a predictor.

Table 5 Metrics for fractional response models predicting preferences in Study 1

The model leveraging algorithm identities and the model leveraging participant scores both accounted for a large amount of variance in subjective preferences (Table 5, top and middle rows). Participants exhibited a clear pattern of preferences across the four agents (Fig. A19). In pairwise comparison, participants favored the \(\theta = 45^{\circ }\) agents over both \(\theta = 0^{\circ }\) agents, and the \(\theta = 0^{\circ }\), \(\epsilon = 0.5\) agent over the \(\theta = 0^{\circ }\), \(\epsilon = 0\) agent. The score model indicated that the higher a participant scored with co-player A relative to co-player B, the more they reported preferring co-player A, with an odds ratio \(\textrm{OR} = 1.12\), 95% CI [1.11, 1.13], \(p < 0.001\).

Nevertheless, knowing participants’ judgments generates substantially better predictions of their preferences than the alternatives (H1; Table 5, bottom row). Both perceived warmth and perceived competence contribute to this predictiveness (Fig. 7a). The warmer a participant judged co-player A relative to co-player B, the more they reported preferring co-player A, \(\textrm{OR} = 2.23\), 95% CI [2.08, 2.40], \(p < 0.001\) (Fig. 7b). Unexpectedly, the more competent co-player A appeared relative to co-player B, the less participants tended to favor co-player A, \(\textrm{OR} = 0.78\), 95% CI [0.73, 0.84], \(p < 0.001\).

As a further test of the predictive power of participants’ social perceptions, we fit another regression with perceived warmth and competence as predictors, this time including score as a covariate (i.e., controlling for the effect of score). Score significantly and positively predicts preference in this model, \(p < 0.001\) (Fig. A20). Even so, the effects of warmth and competence remain significant, with \(p < 0.001\) and \(p = 0.012\), respectively. Among these three predictors, perceived warmth exhibits the largest effect on co-player preferences. That is, it provides a substantial independent signal alongside score and perceived competence when used to predict stated preferences. Social perception thus improves model fit above and beyond that provided by score alone (H2).

Fig. 7
figure 7

Relationship between social perception and subjective preferences in Study 1. The difference in participants’ evaluations of the warmth of co-player A over co-player B significantly correlates with their stated relative preference for co-player A, \(p < 0.001\). Perceived competence exhibits a similar (significant) relationship with preferences, \(p < 0.001\). a and b depict odds ratios and preference predictions, respectively, from a fractional-response regression. Error bars and bands represent 95% confidence intervals

Impression sentiment As a supplementary analysis, we explore the open-ended responses participants provided about their co-players at the end of the study. For the most part, participants felt they could recall their co-players well enough to offer their impressions through written descriptions: in aggregate, participants provided impressions for \(82.2\%\) of the agents they encountered.

For a quantitative perspective on the data, we conduct sentiment analysis using VADER (Valence Aware Dictionary for Sentiment Reasoning) [40]. Echoing the correspondence between warmth and stated preferences, the warmer participants perceived a co-player throughout the study, the more positively they tended to describe that co-player, \(\beta = 0.13\), 95% CI [0.09, 0.16], \(p < 0.001\) (Fig. 8). In contrast, competence did not exhibit a significant relationship with sentiment, \(p = 0.24\). Warmth evaluations, but not competence evaluations, correlated positively with the sentiment of participants’ impressions toward their co-players (H3).

Fig. 8
figure 8

Relationship between social perception and impression sentiment in Study 1. The sentiment that participants expressed toward different co-players correlated with (a) their evaluations of warmth, \(p < 0.001\), but not with (b) their judgments of competence, \(p = 0.24\). Error bands represent 95% confidence intervals

Anecdotally, participants expressed a wide range of emotions while describing their co-players. The \(\theta = 45^{\circ }\) agents often evoked contrition and guilt:

  • “The red player seemed almost too cautious in going after coins which worked for me but made them seem easy to pick on, even though I wouldn’t do that.”

  • “I think I remember red being too nice during the game. It made me feel bad so I tried not to take many points from them.”

  • “This one wasn’t very smart and I stole some of their coins because it was easy. I feel kind of bad. It moved so erratically.”

Table 6 Participants’ evaluations of their co-players were highly consistent in Study 2, as assessed by ICC

Participants discussed the \(\theta = 0^{\circ }\) agents, on the other hand, with anger and frustration:

  • “Very aggressive play-style. Almost felt like he was taunting me. Very annoying.”

  • “They seemed very hostile and really just wanting to gain the most points possible.”

  • “I felt anger and hatred towards this green character. I felt like downloading the code for this program and erasing this character from the game I disliked them so much. They were being hateful and mean to me, when we both could have benefited by collecting our own colors.”

3.2.2 Study 2

Our second study tested whether these effects and results remained robust when participants played Coins under a shifted incentive structure. The alternative structure increased the rewards for coin collections so that players cannot receive negative rewards (Table 2). As expected, this shift resulted in participants earning significantly higher scores than those achieved in Study 1, \(\beta = 27.3\), 95% CI [26.5, 28.0], \(p < 0.001\).

Overall, the perceptual and preference patterns from Study 2 replicated under the alternative incentive structure. As before, participants’ warmth and competence evaluations display satisfactory psychometric properties. Participants’ judgments varied significantly depending on the trait in question, \(F_{3,4650} = 88.5\), \(p < 0.001\). At the same time, participants rated individual agents consistently for each given trait (Table 6). The composite measures show high scale reliability, with \(\rho = 0.92\) for the composite warmth measure and \(\rho = 0.91\) for the composite competence measure.

Social perception The SVO and trembling-hand algorithmic components prompted diverse appraisals of warmth and competence (Fig. 9). Participants perceived high-SVO agents as significantly warmer than low-SVO agents, \(F_{1,1086} = 981.9\), \(p < 0.001\) (Fig. A21a). Similarly, participants judged steady agents as significantly more competent than trembling agents, \(F_{1,1086} = 76.0\), \(p < 0.001\) (Fig. A21b).

Fig. 9
figure 9

Overall pattern of perceived warmth and competence in Study 2. Error bars reflect 95% confidence intervals

Stated preferences We again fit fractional response regressions to understand the relationship between objective metrics, perceptions, and subjective preferences.

Table 7 Metrics for fractional response models predicting preferences in Study 2.

The model with co-player identities as predictors captured a large amount of variance in stated preferences (Table 7, top row). Participants reported a distinct pattern of preferences across the agents (Fig. A23). In pairwise comparison, participants favored the \(\theta = 45^{\circ }\) agents over the \(\theta = 0^{\circ }\) agents, and the \(\theta = 0^{\circ }\), \(\epsilon = 0.5\) agent over the \(\theta = 0^{\circ }\), \(\epsilon = 0\) agent. The model with participant score as the sole predictor performed considerably worse than it did in Study 1 (Table 7, middle row). Still, it captured the same pattern as before: the higher a participant scored with co-player A relative to co-player B, the greater their preferences for co-player A, \(\textrm{OR} = 1.06\), 95% CI [1.06, 1.07], \(p < 0.001\).

Fig. 10
figure 10

Relationship between social perception and subjective preferences in Study 2. The difference in participants’ judgments of warmth for co-players A and B exhibits a significant relationship with their stated preference for co-player A over co-player B, \(p < 0.001\). Competence evaluations similarly significantly contribute to preference predictions, \(p < 0.001\). a and b depict odds ratios and preference predictions, respectively, from a fractional-response regression. Error bars and bands reflect 95% confidence intervals

Participants’ perceptions again serve as a better foundation for preference predictions than either game score or the identity of the specific algorithms they encountered (H1; Table 7, bottom row, and Fig. 10a). The warmer a participant perceived co-player A relative to co-player B, the more they reported preferring co-player A, \(\textrm{OR}=2.63\), 95% CI [2.42, 2.89], \(p < 0.001\) (Fig. 10b). The negative relationship between competence and preferences appeared again: the more competent co-player A appeared relative to co-player B, the less participants tended to favor co-player A, \(\textrm{OR}=0.81\), 95% CI [0.76, 0.88], \(p < 0.001\).

We next fit a joint regression with perceived warmth and competence as predictors, controlling for score. In this model, score significantly and positively correlates with stated preferences, \(p < 0.001\) (Fig. A24). As in Study 1, warmth and competence judgments remain significant predictors of participants’ preferences, with \(p < 0.001\) and \(p = 0.001\), respectively. Once again, perceived warmth demonstrates an effect on stated preferences that exceeds the contributions of score and perceived competence. Social perception enhanced model fit above and beyond that provided by score on its own (H2).

Fig. 11
figure 11

Relationship between social perception and impression sentiment in Study 2. The sentiment in impressions of different co-players correlated with participants’ evaluations of both a warmth, \(p < 0.001\), and b competence, \(p = 0.037\). Error bands indicate 95% confidence intervals

Impression sentiment At the end of the study, participants recalled \(77.3\%\) of their co-players well enough to describe their impressions through written responses. Again, the warmer participants perceived a co-player throughout the study, the more positively they tended to describe that co-player, \(\beta = 0.12\), 95% CI [0.09, 0.15], \(p < 0.001\) (Fig. 11a). Breaking from the prior study, perceptions of competence exhibited a similar effect on post-game impression sentiment: the more competent an agent seemed, the more positively participants described them, \(\beta = 0.04\), 95% CI [0.00, 0.08], \(p = 0.037\) (Fig. 11b). Both warmth and competence judgments positively correlate with the sentiment expressed in participants’ impressions of the agents (H3).

Fig. 12
figure 12

Overall pattern of perceived warmth and competence in Study 3. Error bars reflect 95% confidence intervals

3.2.3 Study 3

Our final study tested whether the relationship between social perceptions and subjective preferences translates to a revealed-preference setting. Does social perception continue to predict preferences when individuals face a partner choice?

Social perception As in the previous two studies, the composite warmth and competence measures exhibit high scale reliability, with \(\rho = 0.85\) for the composite warmth measure and \(\rho = 0.86\) for the composite competence measure. Agents prompted distinct warmth and competence profiles depending on their parameterization, just as seen in Studies 1 and 2 (Fig. 12). Participants perceived high-SVO agents as significantly warmer than low-SVO agents, \(F_{1,297} = 103.4\), \(p < 0.001\) (Fig. A25a). Similarly, steady agents came across as significantly more competent than trembling agents, \(F_{1,297} = 35.3\), \(p < 0.001\) (Fig. A25b).

Revealed preferences To compare the performance of social perception against objective metrics, we fit three logistic regressions predicting participants’ (binary) partner choice. We evaluated these models via AIC and Nagelkerke’s R2 [64].

Table 8 Metrics for logistic models predicting partner choice in Study 3
Fig. 13
figure 13

Relationship between social perception and subjective preferences in Study 3, as modeled through logistic regression. Participants’ perceptions of warmth demonstrate a significant relationship with revealed preferences for co-players, \(p < 0.001\). Competence judgments did not significantly correlate with revealed preferences, \(p = 0.44\). a and b depict odds ratios and preference predictions, respectively, from a logistic regression. Error bars and bands indicate 95% confidence intervals

Fig. 14
figure 14

Relationship between social perception and impression sentiment in Study 3. The sentiment in participants’ impressions of their different co-players correlated with their perceptions of both a warmth, \(p < 0.001\), and b competence, \(p = 0.041\). Error bands reflect 95% confidence intervals

Participants reported a clear pattern of preferences across the agents (Table 8, top row). On expectation, participants favored the \(\theta = 45^{\circ }\) agents over the \(\theta = 0^{\circ }\) agents, and the \(\theta = 0^{\circ }\), \(\epsilon = 0.5\) agent over the \(\theta = 0^{\circ }\), \(\epsilon = 0\) agent. The model with participant score as the sole predictor fared somewhat worse at predicting preferences (Table 8, middle row). All the same, the pattern from Studies 1 and 2 replicated in Study 3: the higher a participant scored with co-player A relative to co-player B, the greater their preferences for co-player A, \(\textrm{OR}=1.06\), 95% CI [1.03, 1.08], \(p < 0.001\).

For a third time, social perception offers stronger predictiveness than do score or co-player identity (H1; Table 8, bottom row). The warmer a co-player appeared to participants, the more likely participants were to play another episode with them, \(\textrm{OR}=2.10\), 95% CI [1.69, 2.65], \(p < 0.001\) (Fig. 13b). There was no significant relationship between perceived competence and partner choice, \(p = 0.88\).

We subsequently fit a regression using perceived warmth and competence as predictors and controlling for score. In this model, score significantly and positively predicts revealed preferences, \(p = 0.011\) (Fig. A27). The effect of perceived warmth on preferences remains significant, \(p < 0.001\), whereas competence evaluations fails to significantly correlate with preferences, \(p = 0.44\). Regardless, the independent effect of perceived warmth exceeded the contribution of score. Overall, social perception improved model fit above and beyond that provided by score alone (H2).

Impression sentiment At the end of the study, participants recalled \(94.3\%\) of the agents they encountered well enough to provide their impressions in written descriptions. The warmer participants perceived a co-player, the more positively they tended to describe that co-player, \(\beta = 0.14\), 95% CI [0.10, 0.18], \(p < 0.001\) (Fig. 14a). Despite the lack of correspondence between perceived competence and partner choice, perceptions of competence exhibited a similar effect on post-game impression sentiment: the more competent an agent seemed, the more positively participants described them, \(\beta = 0.04\), 95% CI [0.00, 0.08], \(p = 0.041\) (Fig. 14b). Both dimensions of social perception correlated positively with the sentiment of participants’ impressions toward their co-players (H3).

3.3 Summary

Overall, we find evidence in support of each of our initial hypotheses:

H1

Social perception significantly predicted participants’ preferences for different agents, as measured through both self-report and partner choice. Participants consistently favored agents that they perceived as warmer and, to a smaller extent, that they perceived as less competent.

H2

The predictive power of perceived warmth and competence extended beyond the insight provided by agent performance. Social perception provided more accurate preference predictions than standard indicators of performance, including the amount of reward received and the specific identity of the agent involved in the interaction.

H3

Social perception correlated positively with the sentiment expressed in participants’ verbal impressions of the agents. Participants employed more positive language to discuss agents that they rated higher on warmth and on competence.

In summary, these three studies provide clear evidence linking perceived warmth and competence to human preferences for artificial agents, over and above objective indicators of agent performance.

4 Discussion

Our experiments demonstrate that artificial agents trained with deep reinforcement learning can cooperate and compete with humans in temporally and spatially extended mixed-motive games. Human interactants perceived varying levels of warmth and competence when interacting with agents. Objective features like game score predict humans’ preferences over different agents. However, preference predictions substantially improve by taking into account people’s social perceptions; success in an interaction is driven not just by its objective outcomes, but by its social dimensions, too. This holds true whether examining stated or revealed preferences.

Participants preferred warm agents over cold agents, as hypothesized, but—unexpectedly—our sample favored incompetent agents over competent agents. These patterns offer potential support for the primacy of warmth judgments observed in interpersonal perception [2]. On the other hand, they may also emerge from the particular algorithm and parameter values that we investigated. It would be interesting to train agents using a wider range of parameter values, testing the robustness of these patterns. Such studies could investigate potential compensation effects between agent warmth and competence (e.g., the tendency to perceive incompetent partners as exceptionally warm; [96]) and build a broader mapping from agent parameters to participants’ perceptions and preferences. Are there agents that balance the influence of warmth and competence evaluations on preferences, or is the relative contribution of perceived warmth robust across settings?

Another possible explanation for this pattern stems from the flexible content of “competence” judgments in mixed-motive games [93]. Did the tutorial or the study instructions inadvertently emphasize the competitive elements of Coins? Our study design may have primed participants to be adversarial, and thus to view selfishness as competence. Follow-up research should investigate a more diverse range of incentive structures and tasks to explore the robustness of this pattern [60].

Our results reinforce the generality of warmth and competence. Perceptions of warmth and competence structure impressions of other humans [77], as well as impressions of non-human actors including animals [82], corporations [45], and robots [76, 79]. In combination with recent studies of human-agent interactions in consumer decision-making contexts [34, 46] and the Prisoner’s Dilemma [58], our experiments provide further evidence that warmth and competence organize perceptions of artificial intelligence.

Competitive games have long been a focal point for AI research [13, 83, 85, 94]. We follow recent calls to move AI research beyond competition and toward cooperation [18]. Most interaction research on deep reinforcement learning focuses on pure common-interest games such as Overcooked [14, 91] and Hanabi [87], where coordination remains the predominant challenge. Expanding into mixed-motive games like Coins opens up new challenges related to motive alignment and exploitability. For example, participants who played with (and exploited) altruistic agents expressed guilt and contrition. This echos findings that—in human-human interactions—exploiting high-warmth individuals prompts self-reproach [4]. At the same time, it conflicts with recent work arguing that humans are “keen to exploit benevolent AI” [43]. Understanding whether these affective patterns generalize to a wider range of mixed-motive environments will be an important next step, particularly given the frequency with which people face mixed-motive interactions in their day-to-day lives  [16, 18]. Human-agent interaction research should continue to explore these issues.

Preference elicitation is a vital addition to interactive applications of deep reinforcement learning. Incentivized partner choices can help test whether new algorithms represent innovations people would be motivated to adopt. Though self report can introduce a risk of experimenter demand, we also find a close correspondence between stated and revealed preferences, suggesting that the preferences individuals self-report in interactions with agents are not entirely “cheap talk” [24]. Stated preferences thus represent a low-cost addition to studies that can still strengthen interaction research over sole reliance on objective measures of performance or accuracy. Overall, preference elicitation may prove especially important in contexts where objective metrics for performance are poorly defined or otherwise inadequate (e.g., [74]). In a similar vein, subjective preferences may serve as a valuable objective for optimization. Deep learning researchers have recently begun exploring approaches of this kind. For example, some scientists attribute the recent success of large language models, including the popular system ChatGPT [80], to their use of “reinforcement learning from human feedback” (RLHF) methods. Given a pre-trained model, RLHF applies reinforcement learning to fine-tune the final layers of the model, optimizing for reward calculated from simulated human preferences. Of course, these optimization methods carry their own risks. As recognized by Charles Goodhart and Marilyn Strathern, “when a measure becomes a target, it ceases to be a good measure” [35, 90]. Future studies can investigate the viability of such approaches.

Nonetheless, preferences are not a panacea. Measuring subjective preferences can help focus algorithmic development on people’s direct experience with agents, but does not solve the fundamental problem of value alignment-the “question of how to ensure that AI systems are properly aligned with human values and how to guarantee that AI technology remains properly amenable to human control” [32]. In his extensive discussion of value alignment, Gabriel [32] identifies shortcomings with both “objective” metrics and subjective preferences as possible foundations for alignment. Developers should continue to engage with ethicists and social scientists to better understand how to align AI with values like autonomy, cooperation, and trustworthiness.