1 Introduction

Experimental economics is a branch of economics that deals with experiments in lab settings that are designed to simulate individual or group behavior in order to better understand the functioning of economic systems. In many cases, human subjects play games and their decisions during games are studied. Such settings have their limitations in terms of scope of participants, number of iterations of a game they can play, and validity outside the particular lab. Many experiments take place on university campuses and samples are drawn from a limited student population in a specific region of a specific country. The global economy involves a highly diverse population living under different cultural, political, and economic conditions.

Consideration of the increasing role bots play in social media, reaching high levels of sophistication that makes them sometimes difficult to distinguish from human behavior, raises the possibility that bots can be created to play games of experimental economics, with the advantage that settings will reflect realities in different countries, and different economic and political systems. Using psychological instruments, human diversity can be infused into bot behavioral preferences, groups sizes can vary a lot more, and number of iterations can be expanded significantly.

We explore two iterated games that may be associated with economic related decision making, namely the prisoner’s dilemma and the power to take game. The first requires a binary decision and the second requires a decision in a continuous range of values in some interval. In both cases, the iterated nature suggests the possibility of bots producing insights that allow for enhancements of some human decision making processes. Additionally, using personality classification may improve predictability of particular strategies employed by individuals or groups.

As a personality model, we employed the widely used Myers–Briggs–Type–Indicator (MBTI) that divides the general population based on four dichotomous dimensions into 16 mutually exclusive subpopulations with distinct personalities and distinct expected preferences and behaviors under various circumstances [12]. The MBTI dimensions model how different personality types perceive information and how they form judgments about said information. The first dimension is “Introversion” (I) versus “Extroversion” (E) describes how different personalities relate to other people and the inner versus the outer world. The “Intuition” (N) versus “Sensing” dichotomy constitutes the second dimension that focuses on preferring abstract over concrete information. The third dimension, “Thinking” (T) versus “Feeling” (F), is about judging based on objective, logical judgments versus intra– or interpersonal values. “Perceiving” (P) versus “Judging” (J) is the last dimension that is about spontaneous adaptation versus adherence to plans.

2 Background

2.1 Prisoner’s Dilemma

The prisoner’s dilemma is a game in which two individuals are in jail and are separately interrogated for a suspected crime [14]. Each person may either “cooperate” with the other – saying nothing about any potential crime while being questioned – or “defect” against them by confirming their complicit involvement. Dawkins in [5] offered the following payoff matrix to describe the possible combinations of choices and corresponding outcomes for oneself and another suspect when the prisoner’s dilemma is “iterated”, i.e. repeated several times (Fig. 1).

Fig. 1.
figure 1

Dawkins’ iterated prisoner’s dilemma payoff matrix.

The present study sought to add to the research on the iterated prisoner’s dilemma by adding personality factors as influencing variables. Specifically, two players were created with independently–determined Myers–Briggs personality types, one representing a human user and the other a bot. It was assumed that Myers–Briggs Thinking and Feeling types each have distinctly–preferred strategies while playing the prisoner’s dilemma. Of primary interest was how each type would perform relative to the other given this personality difference between them. Strategies were matched to Thinking and Feeling types based on the answers to questions such as:

  1. 1.

    Which personality preference is more likely to indicate one’s tendency to cooperate with others?

  2. 2.

    Which personality preference is more likely to indicate one’s tendency to act in their self-interest?

  3. 3.

    Which personality preference is more likely to indicate one’s tendency to make peace offerings following mutual conflict?

  4. 4.

    Which personality preference is more likely to indicate one’s tendency to be wary regarding the intentions of others?

  5. 5.

    Which personality preference, if any, is more likely to indicate a desire for all parties involved in a situation to benefit (i.e., to strive toward the “common good”)?

Strategies. “Nice” strategies were defined in Dawkins’ [5] sense as consisting of its user “never [being] the first to defect” (p. 12). Opposed to nice strategies are “nasty” strategies, the more regular use of which were further hypothesized to correlate negatively with their player’s score. Dawkins defined nasty strategies as being those with which a player would “sometimes defect, however rarely, when not provoked”.

Since users of nice strategies prefer to cooperate, it is arguable that they primarily act in both their and the other player’s interests. This does not hold true for users of nasty strategies, who are comparatively more likely to defect and thus act in their own interest at the expense of the other player’s.

Hypothesis. The hypothesis for this study was that a direct positive correlation would exist between the score of a player (whether automated or non–automated) and the extent to which they employ nice prisoner’s dilemma strategies. This hypothesis reflects Dawkins’ own conclusion, which was that “nice guys do well in this game” (p. 212).

2.2 Power to Take Game

The power to take is a game played by two opposing sides, with one side playing the role of a take authority that claims a percentage of the other sides’s income, and the responding side deciding what percentage of their own wealth to destroy based on the take rate. The game was used in a laboratory experiment by Bosman and van Winken in [1] with students as players to investigate the impact of emotions on behavior. Initially, it was played by individuals, however, Bosman et al. expanded the experiment in [2] to be played in teams in order to explore group decision making.

Previous Findings in Power to Take Experiments. In [1], Bosman and van Winken found that responders typically destroy nothing or everything. The latter typically happened when the responders experienced negative emotions in response to a take rate that was subjectively too high. In line with this finding, responders’ expectations regarding an acceptable take rate was a significant factor for the probability of destroying income.

In [2], Bosman et al. found that the individual decisions in a group follow the same trend observed in the game played by single players. Additionally, most takers thought a 50% take rate was fair, whereas responders tended to find a take rate of 0% as fair. However, only a small minority of the players were concerned with fairness in the group decision making process. Furthermore, both sides ignored the group decision making process of the other side and viewed the other team as a single entity. Lastly, they observed that group decisions were largely in line with a simple majority rule as well as the average of individual take inputs.

Overall, they came to the conclusion that emotional reactions, which they termed “emotional hazard”, result in decreased efficiency both in the two–player and group power to take game. For group decision making processes it seems to be important to consider both the decision making rule as well as the impact of individual decisions. They pose the question of why there are large differences between individual players and how these differences relate to the group decision making process.

Possible Impact of Personality Type on Group Decision Making. Bosman et al. suggested in [2] that expectations–based classification is an important explanatory variable for destruction rate as well as take rate. This fact may suggest that different personality types will have different expectations and will react differently to different realities. Differences in personality type might partially account for the different importance placed on fairness and other factors during the decision making process of both individuals and groups. The personality type composition of teams could have an impact on intra–group dynamics and could thus result in different group behavior.

Relevance of the Power to Take Game for Economics. The take authorities and take rate could be interpreted as the government imposing tax rates on the populations, whereby the destroy rate of the responders could represent the amount of effort people will make to reach a certain level of wealth given a tax rate. It is reasonable to assume that under extremely high tax rates a large segment of a highly talented workforce would prefer to reduce workload to avoid the unattractive prospect of seeing very little reward for maximum effort. When that is the case, the productivity of an economy suffers, and tax collection is reduced. If that does not happen in a certain society, then 100% tax rate would become a viable option and a taxing authority will likely be happy to impose it. However, in situations where the tax payers respond by reducing their productivity to nearly 0%, such an economy could be destroyed.

In an iterated version of the game, the players do not only respond to the immediate decisions of the opposing side, but also to the game history up to the current game round. This might be similar to tax authorities considering the effectiveness of prior tax rates when deciding to increase or decrease the tax rate. Similarly, tax payers might have expectations regarding an acceptable tax rate based on prior tax rates, as well as based on the reactions of the tax authority to signals of dissatisfaction.

Research Questions. Our interest in this context is to see what kind of personalities, individually, and in groups, would potentially perform either one of the two roles in the game particularly well, and which personalities would underperform in either role. A related question of interest is whether certain compositions of personality types can be linked to increased or decreased performance in the game, and, as an extension, which personalities or composition of personalities would be suitable as tax “collectors” and which would be successful as tax payers. Furthermore, we wish to investigate if there are take rates that overall result in the best (or worst) performance regarding overall economy, tax collection or net earnings. Finally, it is of interest to examine the contribution of adjudication methods, team size, and prior, possibly cultural, history of tax rates to the group decision making process.

3 Methodology of the Prisoner’s Dilemma

3.1 Type Designation and Personality–Based Strategy Assignment

MBTI F types were assigned three nice strategies out of a total of five, meaning that they usually did not defect unless the other player did first. In contrast, T types were assigned three nasty strategies (out of five strategies, total). These decisions were made because “individuals who score the highest on scales of disagreeableness [meaning they are more likely to also score as T rather than F types [10]] appear to others as being conceited, egocentric, competitive, antagonistic, skeptical, overcritical, or distrustful toward rivals” ([11], p. 120). Those who are more likely to score as F types were thus assumed to be more cooperative and trustful toward other players.

Before the games started, the human user and bot (named “Sirius” after this project’s group) were typed independently. The human user was prespecified to always be an F, and Sirius a T. However, along each of the I/E, N/S, and J/P dimensions, players were typed randomly. After the program generated the user and Sirius’ types, they were assigned distinct strategies for the coming set of prisoner’s dilemma rounds based primarily on whether they were Thinking or Feeling types.

Some strategies (i.e., “Tit for Tat” and “Random”) were possible for both players regardless of their types, whereas others were only possible for either T or F types. For instance, “Always Defect” was only possible for the former and “Remorseful Prober” was only possible for the latter. F strategies included Tit for Tat, “Naive Peacemaker”, Remorseful Prober, “Always Cooperate”, and Random; T strategies included Tit for Tat, “Tit for Two Tats”, “Naive Prober”, Always Defect, and Random. Descriptions of each of these strategies were borrowed from existing literature and work surrounding the iterated prisoner’s dilemma [4, 7].

3.2 Simulation

Following execution of the designation program, strategies were inputted into a minimally–adapted version of a Web–based prisoner’s dilemma simulator [4] (Fig. 2). Since multiple strategies were assigned uniquely to both T and F players and the comparative performance of these two types was of primary interest, 50 iterations per type and strategy designation were run.

Fig. 2.
figure 2

First round of the iterated prisoner’s dilemma (human strategy on the left; bot on the right).

Each simulation was followed by the appearance of a dialog box which showed the game’s final outcome.

4 Methodology of the Power to Take Game

In this study we simulated iterated power to take games with bots as players in order to examine factors and behaviors that might be relevant for group decision making processes and reactions to taxation circumstances over time. The investigated factors include personality type, team size, cultural background, adjudication method and team composition. Differences in behavior can be caused by, among other things, different personality traits that have been measured in models such as the Five Factor Model (FFM) [10]. For this study, a similar instrument, the Myers–Briggs Type Indicator (MBTI) [9], that nevertheless taps into personality traits described by the FFM [10] was employed to model automated bot behavior after human behavior in the power to take game.

4.1 Bias–Based Reasoning

In this study, we decided to implement a bias–based Knowledge Based Expert System (KBES) as described by Hancock [6]. The expert system allowed us to embed several heuristics for strategies into the player bots that might be employed by humans in the power to take game. Just as experts are biased by a priori beliefs, bias–based reasoning was used to determine the bots’ a priori biases in making game–related decisions. This enabled the bots to react dynamically to changing game situations based on their personality–based preferences and disinclinations towards different factors and strategies.

A collection of preferences \(p_i \in [0,1]\) for a certain factor or strategy and a collection of disinclinations \(d_i \in [0,1]\) for that same factor or strategy are accumulated into the overall preference p and disinclination d as follows:

$$\begin{aligned} p= & {} p + p_i\ (1-p) \nonumber \\ d= & {} d + d_i\ (1-d) \end{aligned}$$

with \(p:=0\) and \(d := 0\) at the beginning. This assumes values in \(\in [0,1]\).

This method was used to compute the strength of the bots’ reactions to changing circumstances. Additionally, it was employed to determine their preference for certain personality–based factor we deemed relevant for behavior during the power to take game.

Given the preference p and disinclination d of a bot towards a certain factor or situation, the attitude \(\varDelta \) of a bot was calculated as follows:

$$\begin{aligned} \varDelta := p - d \end{aligned}$$

\(\varDelta \) assumes values between \(-1\) and 1, whereby values \({>}0\) imply a preference for that factor in questions, whereas values \({<}0\) imply a preference for the opposite. Since we were mainly interested in the positive preference towards a specific factor and not its opposite, we treated values \({<}0\) as 0.

4.2 Game Basics

In the first step, the game loads a configuration file that specifies: the number of rounds, the amount of money each round, adjudication method, prior take rate history, exact team composition in terms of MBTI type, the personality extremity, and the minimum variability that each bot exhibits. Based on these specifications, the bots are initialized for each team and play the game as described below.

Bot Creation. Given the MBTI four letter code, as well as the two parameters lower bound \(l \in [0,1]\) and upper bound \(u \in [0,1]\) that describe how extreme the personality is supposed to be:

  1. 1.

    For each dimension with n questions, randomly generate between \(l\cdot n\) and \(u \cdot n\) positive answers to the corresponding questions in the MBTI questionnaire. The random generation is uniform.

    For example, for the I–E–dimension with \(n=10\) questions of an introvert, \(l=0.8\) and \(u=0.9\), randomly generate 8 or 9 introvert answers plus, correspondingly, 2 or 1 extrovert answers.

  2. 2.

    Based on the bot’s specific answers to the MBTI questionnaire, compute various aspects of personality that are relevant to determine employed strategies.

Game Loop. In each game round:

  1. 1.

    Each responder bot earns the specified amount of money.

  2. 2.

    Taker bot decision:

    1. (a)

      The taker bots decide individually what percentage of the responder bots’ income they want to claim.

    2. (b)

      The game collects these individual taker decisions.

    3. (c)

      Each taker bot decides if it wants to adjust its decision based on the opinions of its team members.

  3. 3.

    The game adjudicates the final take rate.

  4. 4.

    Responder bot decision:

    1. (a)

      The responder bots decide individually how much of their own income they want to destroy.

    2. (b)

      The game collects these individual responder decisions.

    3. (c)

      Each responder bot decides if it wants to adjust its decision based on the opinions of its team members.

  5. 5.

    Remove the destroyed income from the responder bots.

  6. 6.

    Take the taker rate from the remaining responder bot income and add it to the collective money pool of the taker bots.

4.3 Relevant Personality Type–Based Factors

In similar studies involving MBTI and bots, strategies were assigned to the bots based on dimensions alone, sometimes taking into consideration the strength of the preference for that dimension [16]. However, after looking at a version of the MBTI test, different question in the same MBTI dimension appeared to imply different preferences for strategies in the power to take game. Thus, going by dimensions alone seemed insufficient for this context. On the other hand, neither did it seem to be a good idea to look at each test question individually in order to extrapolate entire strategies from that single question alone.

Therefore, we focused on several factors related to the information that was available to the bots during the game, or that were identified in previous studies as potentially relevant for the power to take game [1, 2]. We then looked at each question in the MBTI test and assigned values representing if a particular answer implied a preference or disinclination towards that factor, and how pronounced that preference or disinclination was. Bias–based reasoning was then used to compute the values of these factors for each bot during bot creation based on the specific answers it selected on the MBTI test. Some factors required additional computation that are explained below. An overview of the factors and their value ranges in extreme personalities can be found in Table 1.

Table 1. Factors relevant for the power to take game in MBTI types with 100% in each dimension. The value range of each factor is \(\in [0,1]\).

Consensus. In group setting like our version of the power to take game, decisions are not made by individuals in a vacuum. Instead, they are exposed to the opinions of their team members. To what degree they are swayed by their team’s opinions partly depends on their personality type. The Thinking versus Feeling MBTI dimensions seemed to be especially relevant for this factor.

The consensus factor ranges in value from 0 to 1, whereby 0 implies that the bot is not influenced by its team’s opinions at all, whereas 1 implies that it wants to go entirely with the group consensus.

Variability. Human beings are not perfectly consistent in their behavior, and exhibit some degree of variability in their decisions. This is partly caused by factors unknown to the observer, but seems to be influenced by personality type as well. This factor seemed to be largely influenced by the MBTI Judging versus Perceiving dimension.

Given a base variability value \(b \in [0,0.5]\) and the bot personality’s difference between preference and disinclination \(\varDelta _{p,d} \in [0,1]\) for variability, the variability v of a bot was computed as

$$\begin{aligned} v := (1 + \varDelta _{p,d})\ b \end{aligned}$$

The variability factor thus ranges in value from b to \(2b \in [0,1]\), with higher values implying higher variability.

Experiment. This factor is similar to variability but taps into other aspects of personality as well. Namely, it describes how much a bot prefers to stick to one strategy or experiment with different approaches. The mainly relevant MBTI dimensions seemed to be Judging versus Perceiving and Sensing versus Intuition.

The experiment factor ranges in value from 0 to 1, whereby 0 implies a complete disinclination for experimentation, whereas 1 implies that experimentation is extremely important to the bot.

Fairness and Greed. Bosman et al. identified fairness as a potential factor in the power to take game that was considered by a minority of the players [2]. This factor describes how much the bot values fairness versus maximizing its own gain during the decision making process. Thinking versus Feeling seemed to be the most relevant MBTI dimension.

The fairness factor ranges in value from 0 to 1, where 0 implies a greedy personality with complete disregard for fairness, whereas 1 implies a very fair bot that cares little about maximizing its own gain at the expense of others.

Tradition. Bosman and van Winken found that prior expectations were a significant factor for the probability of destroying income in the power to take game [1]. In our implementation, we added a prior take rate history to the game in order to give the bots expectations at the beginning of the game (Table 2). This factor describes how important these expectations are for the bots. This seemed to be mainly influenced by the Intuition versus Sensing and Judging versus Perceiving MBTI dimensions.

However, the relevance of this prior take history likely declines during the course of the game compared to other factors. Given the bot personality’s difference between preference and disinclination \(\varDelta _{p,d} \in [0,1]\) for tradition, the tradition factor t of a bot in round i of a total of N rounds was computed as

$$\begin{aligned} t := \frac{1}{2} \left( t + t\ \frac{N - i}{N} \right) \end{aligned}$$

The tradition factor ranges in value from 0 to 1, whereby 0 suggests that the bot does not care at all about conditions before the game start, whereas 1 suggests a strong expectation that these initial conditions are maintained.

Table 2. Power to take: Income tax rates in percent in the United States of America, Germany, United Kingdom, Japan and Russia (Source: KPMG [8]).

Information. The game is potentially played with a larger number of rounds. It seemed unlikely to us that human players would remember the entire game history perfectly, or care for it. Instead, we assumed that recall would be better for the last rounds and that some personality types would consider more information than others. This seemed to be mainly influenced by the Intuition versus Sensing, Perceiving versus Judging and Thinking versus Feeling MBTI dimension.

The information factor ranges in value from 0 to 1, where 0 implies that the bot only looks at immediate information, whereas 1 implies consideration of as much information as possible.

Reactivity. “Emotional hazard” was identified by Bosman and van Winken as a factor that decreased efficiency in power to take games with human players, since responders chose to destroy everything when experiencing negative emotions [1]. Therefore, we decided to introduce the reactivity factor that describes how strongly a bot is going to react if something goes against its wishes or expectations, or in other words, how volatile it is. The main relevant MBTI dimensions seemed to be Judging versus Perceiving and Thinking versus Feeling.

The reactivity factor ranges in value from 0 to 1, whereby 0 suggests no reaction in response to seemingly upsetting situations, whereas 1 suggests extreme reactions.

4.4 Decision Making

Individual Decision Making. Each rule \(i \in \{1,\ldots ,n\}\) generates a proposed take or destroy rate \(x_i\) and corresponding weight \(w_i\). The bot’s preliminary decision \(\bar{x}\) is the weighted mean

$$\begin{aligned} \bar{x} := \frac{\sum _{i=1}^n w_i x_i}{\sum _{i=1}^n w_i} \end{aligned}$$

Since the weights depend on the personality factors, each proposed take or destroy rate only features as much into preliminary decision as it is of interest for the bot, and it is possible that the output of some rules is completely irrelevant for a bot.

Taker Strategies. A simple rule that fires each round is the “tradition rule” that casts a vote for the mean of the cultural background take rate. The importance of this rule depends entirely on how important the tradition factor is for the bot.

The “greed rule” is another simple rule that is about maximizing the collection by maximizing the take rate. The weight of this rule depends completely on how greedy the bot is.

The “fairness rule” is slightly more complex and attempts to propose a fair take rate that takes into consideration the bot’s own expectations as well as the reactions of the responder team. The importance of this rule hinges on the fairness factor.

The “upset rule” is designed to add potentially irrational behavior in reaction to negative emotions caused by unmet expectations such as high destroy rates in response to culturally acceptable take rates, perceived fair take rates and exploitation of good will. Upset is accumulated using bias–based reasoning and depends on how important these factors are to the bot. How much the taker bot wants to increase the take rate depends on how upset it is, and the weight of this increase depends on the reactivity factor.

The most complex rule is the “strategy rule”. It contains common sense reasoning such as not increasing the take rate once the goal take rate is achieved, detecting and countering manipulation by the responder team as well as the employment of one of two different strategies that are selected based on the bot’s personality. Bots that are below average in the experiment factor try to introduce higher take rates with small, incremental increases, whereas bots above average in the experiment factor try to achieve their goal take rate by experimenting with larger, but targeted increases in take rate. The weight of these strategies depends on the greed, experiment and information factors as these seemed to be the factors most relevant for opportunistic and strategic thinking.

Responder Strategies. The responder bots possess a “greed rule” as well that attempts to maximize net earnings by minimizing the destruction. The degree by which the destroy rate is lowered and the weight of the proposed destroy rate depends on the greed factor.

The responders’ “upset rule” is similar to that of the taker bots and leads to high destroy rates when the bot exhibits a high reactivity and the take rate deviates too much from the cultural background, perceived fair take rate or when lowering of the destroy rate was exploited by the taker team.

The “strategy rule” implements more sophisticated strategic thinking that uses the destroy rate as a signal of what the responders consider to be an acceptable take rate. Additionally, it attempts to detect and circumvent taker strategies aiming to increase the take rate. The weight of this rule depends on the greed, experiment and information factors.

Adjustment of Individual Decisions Based on Group Opinion. After the bots made their individual decisions, they get the opportunity to adjust their opinion based on the mean of the individual decision in the entire team. In our model, the inclination of a bot to reconsider its opinion depended on the consensus factor as well as the (un–)certainty regarding its own opinion. The closer the take or destroy rates proposed by the different rules, the more certain the bot is that it made the right decision. Whereas a wide spread in possible rates leaves the bot conflicted and uncertain.

Given the proposed rates \(x_i\) and corresponding weights \(w_i\) of rules \(i\in \{1,\ldots ,n\}\), as well as the number of non–zero weights m and the weighted mean \(\bar{x}\), the uncertainty u is the weighted sample standard deviation:

$$\begin{aligned} u := \min \left( 0.5, \sqrt{\frac{\sum _{i = 1}^n w_i\cdot (x_i - \bar{x})^2}{\frac{m-1}{m}\sum _{i=1}^n w_i}} \right) \end{aligned}$$

With the consensus factor c, the adjustment rate a is computed as follows

$$\begin{aligned} a := \frac{1}{2} c + u \end{aligned}$$

Given the mean of the individual decisions in the group g and the bot’s own, preliminary decision \(\bar{x}\), the final decision d is then

$$\begin{aligned} d := ag + (1-a)\ \bar{x} \end{aligned}$$

Lastly, the variability v is applied to the reconsidered decision by choosing a uniformly random number from the interval \([\max (0, d - v), \min (1, d + v)]\).

4.5 Experiment Design

In a first experiment, we investigated the performance of teams consisting of only one MBTI type with extreme personalities (0.8–0.9 in each dimension) playing other such teams, with different combinations of team size and cultural background.

In a second experiment, we introduced two additional team compositions, namely a team consisting of the real MBTI type distribution in the general U.S. population [13] and a team consisting of a random sample of that real distribution. We constructed all of these teams both with a personality extremity of 0.8–0.9 and a more realistic 0.55–1.0 as control, while keeping the team size constant.

In a third experiment, mixed teams of the best and worst takers and best and worst responders played against each other as well as against a sample of the real distribution. See Table 3 for the precise experiment design.

Table 3. Experiment design for the power to take game.

4.6 Implementation

Game configuration files were generated in YAML format. Configuration file generation, bot creation, the power to take game and experiment analyses were all implemented in Python 3 [15]. The simulations were run and tested on Windows 10.

5 Results

5.1 Prisoner’s Dilemma

Mean score for Feeling types was 110.8, standard deviation 43.4, for the Thinking types 130.8, 40.6, respectively. A one–way t–test is significant at \(p=0.037\), suggesting improved expected performance of T type personality over the F type (Table 4).

Table 4. Results of 30 iterated Prisoner’s Dilemma simulations.

5.2 Power to Take

Examination of the factor value ranges of extreme personalities in Table 1 shows that all T types have no interest in consensus or fairness, are not reactive and prioritize maximizing their own gain. In addition, N types are not interested in tradition, which is also the case for STPs. SJs exhibited the highest interest in tradition and SFPs exhibited a small amount. It is notable that all types are greedy to some degree and no type is perfectly fair, with the NFPs being the most concerned with fairness followed by the SFPs.

Experiment 1. Examination of experiment 1 results identifies ENTPs and INTPs as the best taker personalities for highest tax collection rates and ISFJs and ESFJs are the worst personalities. Regarding net earnings, the best responder personalities are INTJs and ESTPs, and the worst personalities are ENFPs and INFPs. These trends hold across different team sizes and cultural backgrounds.

The best matchup for takers in terms of highest tax collection rate were ISTJs as responders, and an examination of matchups between ENTPs and ISTJs clearly demonstrates the ability of the ENTP taker bots to impose high take rates on the ISTJ responder bots and without triggering high destroy rates. In contrast, the worst matchup for taker bots are ENFPs as responders, since they resort to high destroy rates that lead to lowering of the economy size, tax collection for the takers and also lowering of the net earnings of the ENFP responders; a clear loss for all parties involved. The best overall matchup for responders in terms of highest net earnings were ISFJs as takers.

Experiment 2. Matchups of extreme personality types (0.8–0.9 in each dimension) versus the real type distribution in the general, human population with actual type extremity (0.55–1.0 in each dimension) exhibited the same patterns observed in experiment 1 (Table 5). Again, the ENTPs and INTPs were the best tax collectors, while ESFJ and ISFJ taker bots achieved the highest economy rate at the expense of their tax collection. Similar to the observations of experiment 1, the ENTP taker bots managed to establish extremely high take rates, resulting in high tax collection rates, while maintaining a moderate level of resistance by the general population.

ENFP and INFP responder bots destroyed the most income against a taking authority from the general population, whereas ISTJ and ESTJ responder bots destroyed the least income. Unlike in experiment 1, ENTJ and ESTP responder bots performed the best in terms of maximizing net earnings for themselves, while ENFP and INFP responder bots performed the worst, which is likely caused by resorting to high destroy rates. These trends held across different matchups and other conditions under investigation.

It is important to note that both the real distribution and sample real distribution responders ranked in the middle tertile in terms of net earnings and destruction rate. This was also the case for such takers regarding economy rate and collection rate.

Table 5. Power to take, experiment 2: Overall ranking of takers and responders with type extremity of 0.8–0.9 versus the real type distribution with type extremity of 0.55–1.0. The rates are the mean of what the teams achieved compared to what was possible in the corresponding games. The highest rates are marked in light gray, whereas the lowest rates are marked in middle gray.

Table 6 presents an interesting result concerning the relationship between take rate and economy rate. The significant Pearson correlation of \(r = 0.4199\) (\(p < 0.001\)) indicates a general trend of higher tax rates being associated with larger economies. However, this trend does not extend to the full range of take rates. In fact, the worst take rates for the economy rate are the extremely low and extremely high tax rates, a fact that should both be expected and desired.

Experiment 3. When teams comprised of different personalities are matched against each other, we can observe a moderating impact of the adjudication method. This is largely because compositions of different personalities will tend to exhibit higher intra–group variability, so the adjudication method makes a difference as the group size increases.

Table 6. Power to take, experiment 2: Ranking of take rates regarding economy rate, collection rate and net earning rate with the real type distribution with type extremity of 0.55–1.0 as responders. The economy, collection and net earning rates are the mean of the rates in reaction to the respective take rate. Given are the Pearson’s correlation coefficient r and the p–value of the correlation based on a two–sided t–test.

6 Discussion

6.1 Prisoner’s Dilemma

Results of simulations appear to suggest an association between one dimension of MBTI and expected reward in an iterated prisoners dilemma situation. In particular, it appears that a T type personality would be expected to outperform a F type personality. If true, that would be inconsistent with Dawkins’ findings suggesting nice players win. As in many cases, conditioning on different assumptions and considering a different set of strategies for the PD game for the T type and F type, with somewhat different overlap, may lead to different results. So in reality, the practical implication of this part of the study is that such methodology of bot utilization for personality related binary decision–making processes can be of value. Enhancements of this kind to binary decision–making processes may have limited power and may be realized only after a large number of iterations.

6.2 Power to Take

Results based on 41,320 configurations, each played for 100 rounds, appear to suggest that personality type matters for this game, and by extension, matters in the context of taxing authorities interacting with tax payers. This means that choices of particular personalities could lead to improved or reduced performance in any of the measures that one may be interested in, i.e. total tax collection, size of economy and net earnings, compared to other personality choices, or compared to the general population with its mixed personality type composition.

It may be worth noting that prior studies primarily involving students playing the game individually or in groups, obtained results with somewhat undesirable characteristics, namely destruction rates that tended to be either 0% or 100%, which lead to the responders decisions appearing to be essentially binary in those studies. This binary approach by a responder (tax payer) may be rational if the game is played once, since anytime the tax rate is under 100%, responders would maximize net earnings for the game by destroying 0%.

However, taxing circumstances repeat, and the reality is that tax collection as a function tax rates is a continuous function, so an iterated approach may be more appropriate in order to understand the evolution of the tax system with bot–based systems offering an advantage in the examined scope. In a large number of configurations played, the prevailing trends were of increasing tax rates over time, decreasing destruction rates over time, and decreasing net earnings over time. Such circumstances are not unrealistic in the real world, so rules used for bot behaviors, and the aggregate belief and disbelief method may be fairly effective for this problem.

Nevertheless, a potential problem is that we observed a deviation in net earning ranking from the average household income ranking in the general population [3]. This could be an artifact of using extreme personality types versus a real type distribution, but could also be an indication that additional or different factors and strategies might need to be considered, or that our interpretation of the relationship between MBTI test answers and bot behavior does not reflect reality.

In addition, the association of results of such games using our approach to economics may be limited in its validity. An economy is dynamic and iterating an identical game a large number of times may not always be a good model. There were elements of variability infused into behaviors and rules, but the essential rules of the games remained the same in terms of matrix of payoffs for the prisoners dilemma and per game wealth initially awarded to each responder each round in the power to take game.

7 Conclusion

This study suggests that personality may play a role in both binary and continuous decision–making processes as well as both single–player and group settings, and an iterated approach using bots playing humans may enhance our understanding of human decisions, potentially leading to better processes and better outcomes.

Future research may elect to examine different personality instruments, perhaps based on the Five Factor Model and infuse dynamics to the economy. Furthermore, it might be worth examining a variant of the iterated power to take game where the take authority has an income that is subject to the take rate, just like the responders. That way, the take authority would not just be concerned with maximizing the take rate but also take into consideration net earnings and the overall size of the economy, similar to taxing authorities in an economy.