1 Introduction

The scarcity of resources is one of the modern day society’s most prominent challenges. With a strong population growth and shift to urbanization, our finite natural and infrastructure resources are seeing unprecedented levels of stress. The need to devise fair and efficient means of access to these resources is now more eminent than ever.

In this paper, we study a class of dynamic resource allocation problems in which an indivisible resource is repeatedly contested between two anonymous users who are randomly drawn from a large population. Figure 1 demonstrates three motivating examples for this class of resource competitions.

  1. (a)

    Due to excessive demand, only one of two trip requests from ride-hailing riders can be served by the closest ride-hailing driver. Which rider should be served?

  2. (b)

    Two autonomous vehicles (AVs) meet at an unsignalled intersection. Which AV should go first?

  3. (c)

    To improve quality of service for critical content, an internet service provider (ISP) splits its bandwidth into a high capacity fast channel and a low capacity slow channel, and dedicates the fast channel to half of the total traffic. Two internet content providers (ICPs) simultaneously request service. Which ICP should be granted to the fast channel?

Fig. 1
figure 1

Three motivating examples for karma mechanisms. a When demand exceeds supply, which trip request should a ride-hailing platform assign to the next available driver? b In a future of autonomous vehicles (AVs), can the AVs self-coordinate who passes first in intersections in a fair and efficient manner? c Can an internet service provider (ISP) dedicate a fast channel for critical content in a net-neutral manner, i.e., without charging internet content providers (ICPs) differently?

These examples share in common that the resource (the ride-hailing driver, the intersection, or the fast channel) is repeatedly contested over a long time horizon by anonymous users in a large population (the ride-hailing riders, the AVs, or the ICPs), who could have time-varying private needs for accessing the resource. It is important to note that the reality of these examples is complex, and they each deserve a separate treatment. Nonetheless, the level of abstraction chosen in this paper serves to highlight the fundamental trade-offs that arise in this class of problems, as well as to ease the presentation of the novel analysis tools developed to study them. We adopt a level of generality in these tools that allows them to be readily specified to more complex settings.

One can draw inspiration from how small communities sometimes manage to self-organize the access to common resources [41]. Typically these communities devise systems that facilitate a form of fair ‘giving and taking,’ i.e., systems where the users take turns in accessing the resource (see, e.g., the system devised by local fishermen to manage the inshore fishery in Alanya, Turkey [3, 41]). This work is an attempt to systematize the ‘give and take’ such that it can be applied unambiguously in a large-scale system and builds upon the exploratory concept proposed in [13]. The main enabler is karma, a counter that encodes the history of ‘giving and taking’ of the users. Loosely inspired by the notion of karma in Indian tradition [39], each user is endowed with karma points which increase when the user yields the resource, and decrease when the user accesses the resource. A user with a high level of karma was likely yielding in the past and gets an advantage in receiving the resource now. In turn, the disfavored user that yields now gets compensated in karma which will give them an advantage in the future.

In its simplest form, such a karma mechanism is an effective mean to facilitate turn-taking on the large scale. But a karma mechanism can do more than simple turn-taking. If users are given the option to choose how much karma to use now, e.g., through an auction-like karma bidding scheme, the karma becomes also a means to express private temporal preferences. For example, a user may decide to yield (and gain karma) when its urgency is low, in anticipation to the situation where accessing the resource is time-critical. A karma mechanism can hence facilitate the allocation of the resource to whoever needs it most, i.e., the maximization of resource allocation efficiency.

A classical device that is used to express preferences and facilitate access to resources is money. The ride-hailing platform can raise the trip price until only one of the contesting riders is willing to pay for it—a common practice referred to as surge pricing [10, 12]. This practice has seen some public criticism due to a lack of transparency and a tendency to raise prices in a manner that is deemed unfair [15, 51]. While in principle it is meant to allocate trips to the most needy riders, in practice the trips are allocated simply to those who can afford them. In the transportation domain, decades of research on the use of monetary road pricing policies has been faced with little public enthusiasm due to concerns for equitable access to the roads [7, 59, 60]. A popular remedy is the use of tradable credits, which are periodically issued road access tokens that are allowed to be traded in a monetary market. While these schemes ensure that the yielding users can at least sell their credits and receive (monetary) compensation, they neglect the fact that wealthy users (i.e., those who have high ‘value of time’ [5]) persist to have a systematic advantage in accessing the resource [59]. Finally, the topic of net neutrality has seen widespread public debate in recent years, with strong concerns that the internet will lose its integrity as an open and free resource if ISPs charge ICPs differently [6, 30, 37, 42]. In all of the above debates, the potential existence of a simple and efficient non-monetary solution seems to be overlooked.

Karma shares similarities with money in how it acts as a token of exchange, but has the distinguishing feature of being acquired from fair exchanges that are relevant to the resource allocation problem at hand. One need not worry about matters of wealth inequality, or rely on the assumption that money is a universal, objective measure of value, since karma is only acquired from the process of yielding the resource to another user. Karma hence facilitates the design of a purpose-built, self-contained economy for the resource allocation task. Like in monetary economies, the karma economy can be tuned to achieve different fairness and efficiency objectives in a manner that is targeted to the specific resource allocation task, through the design of karma payment rules, the redistribution of karma, and other techniques that we explore here.

Karma mechanisms promise to be efficient and fair, but these unconventional mechanisms require a novel analysis. A difficulty in the analysis arises due to the lack of reliance on an extrinsic measure of value. Karma does not have value a-priori and is never used directly in the cost functions of the users. The value of karma instead arises from how it facilitates access to the resource, and how users need to ration its use in order to cover their future resource access demands. This makes the behavior of rational users under the karma mechanism, and the resulting social welfare, difficult to predict, and requires the formulation of non-trivial dynamic games played in large populations. In this work, and in comparison to [13], we develop a tractable and rigorous game-theoretic model to study karma mechanisms that is built on top of the class of dynamic population games [17], and prove the existence of a suitable notion of equilibrium, the stationary Nash equilibrium. We then utilize these technical tools to numerically investigate the strategic behaviors that emerge under the karma mechanisms, their consequences for the social welfare, as well as how the mechanisms can be tuned to achieve different resource allocation objectives.

1.1 Related Works

1.1.1 Repeated Games

The celebrated folk theorem [19, 21] asserts that any individually rational outcomeFootnote 1 of a finite-player single-stage game can be sustained in a Nash equilibrium of the infinitely repeated game, provided that the players are sufficiently future aware. The constructive proofs of this classical result and many of its extensions rely on the notion of switch strategies, in which the players initially agree on the socially desirable set of actions to play. In case a player deviates from the agreed action, all other players effectively punish the deviator by switching to a set of actions that make the deviator worst off. This requires the ability to both detect a deviation and identify the deviator.

Several extensions of the folk theorem consider when the actions of others are not perfectly observable, thereby posing a difficulty in identifying deviators. These include when the players observe a common public outcome [22] or when they only observe private outcomes [20]. These works impose identifiability conditions on the stage game which essentially guarantee that each player can identify the history of others’ actions from the observable outcomes. Another extension of the folk theorem is to stochastic games [16] where the stage games are time-varying and depend on the previous actions of the players. The difficulty in this setting is that deviations are not only immediately beneficial, but could also take subsequent games into a regime that is profitable to the deviator on the long run. The authors impose conditions on the game which essentially guarantee that the long run cost of punishment outweighs the long run benefit of deviation.

All of these works consider that every stage game is played by the same finite set of players. A more related setting is when two players in a large population are randomly matched in each stage, for which a folk theorem is shown in [40]. Each player is associated with as social status state that is observable by others. Deviators are punished by changing their status from good to bad, which all future matched players observe and punish for.

Our setting differs fundamentally from the above cited works. We consider that the players have time-varying private preferences, namely their urgency to acquire a contested resource. In contrast to folk theorem results with private information [20], the privacy is with respect to the payoffs of the opponents, rather than the actions they play. In contrast to folk theorem results for stochastic games [16], the time-varying nature is with respect to the private player preferences rather than a fully observable game state. In the context of a folk theorem, the socially desirable set of actions in our setting is for the players to report their urgency truthfully such that the resource is allocated to the highest urgency player. It is not obvious how deviation from truthfulness can be detected in the first place. In principle, if the time-varying urgency process is public and the game is played with the same finite set of players, non-truthfulness can be detected on the long run by correlating the history of each player’s reports to the expected history [32]. But in a large population setting where maintaining explicit histories is infeasible, we interpret karma as an extension of the social status state in [40] to handle private preferences, essentially placing a budget on how often players can declare high urgency.

1.1.2 Mechanism Design Without Money

The famous Gibbard–Satterthwaite impossibility theorem [24, 48] poses a fundamental challenge in the design of resource allocation mechanisms. It asserts that when there are three or more alternative allocations, it is impossible to design a strategy-proof mechanism that is non-dictatorial when the domain of preferences is unrestricted. One avenue to escape this impossibility is in the use of money, which imposes structure on the preferences of the users by measuring them against the objective monetary yardstick. The problem of designing monetary mechanisms is well studied, with positive results including the Vickrey–Clarke–Groves (VCG) mechanism [14, 28, 57], a general mechanism that is well known to be strategy-proof and lead to efficient allocations. On the other hand, the design of mechanisms without money [49] is in general more difficult due to the lack of a general instrument that can be used to align incentives. Some successes include the cases when preferences are single-peaked [36], when each user has one item to trade [50], and when matching users pairwise in a bipartite graph [23], which all leverage specific structures in the preferences of the users that are difficult to generalize. When users must express preferences over many alternatives, a general approach is the pseudo-market pioneered by [31] and famously adopted in the context of allocating course seats in business schools [8, 9, 52]. In a pseudo-market, users are given a finite budget of tokens to distribute over the alternatives, whose prices (in tokens) are set/discovered to clear the market (i.e., allocate the correct amount of resources to the correct amount of users). However, pseudo-markets only promise to be Pareto-efficient and are also not strategy-proof (although strategizing becomes difficult when there are many users and alternatives) [31]. It is noteworthy to mention that in our motivating examples, any allocation of the contested resource is Pareto-efficient.

The aforementioned difficulty stems from the fact that the classical mechanism design problem is concerned with a static or one-shot allocation of goods. On the other hand, when the allocation is dynamic or repeating over time, new opportunities for the design of strategy-proof and efficient mechanisms present themselves. On a conceptual level, just as how money can be used to incentivize truthful behavior (or punish non-truthfulness), a similar incentive can be achieved through a promise of future service (or denial thereof). Despite of this intuitive notion, the role that repetition could play in mechanism design has only been recently noticed, and the literature on mechanism design for dynamic resource allocation is sparse. A few recent works build upon the notion of “promised utilities,” pioneered in the context of contract design in repeated relationships [54]. These include [29], which develops an incentive compatible mechanism for the case when a single principal repeatedly allocates a single good to a single user, and [1], which extends this approach to the case when the single good is repeatedly allocated to one of the same contesting users. The working principle of these works is to find a set of future utilities that the principal promises to the user(s) as a function of their reported immediate utility, in a manner that incentivizes truthful reporting, and while ensuring that the principal can recursively keep these future promises. This is based on the assumption that the principal has the power to commit to the promised utilities, without specifying the exact mechanism to do so. Similarly, karma is a device that encodes future promises; the higher a user’s karma the more favorable its future position will be. But with karma, these promises need not be made explicit, and a single principal need not be held accountable for them. Instead, the future value of karma arises in a decentralized and natural manner as the users strategically ration its use. The promise of future utility is made by the population as a whole by attributing the right of access to future resources to karma.

In other related works, [53] leverage the high likelihood of kidney transplant failures to incentivize participation in the kidney exchange by providing a priority for re-transplant to participants. [34] similarly incentivize participation in the kidney exchange by issuing participants a voucher for re-transplant that is also redeemable by their offspring. We consider our karma mechanisms to be complimentary to these works, offering a simple and intuitive alternative that has the potential to scale to large systems and across multiple applications.

1.1.3 Artificial Currency Mechanisms

A special class of mechanisms without money, which is perhaps the most related to our karma mechanisms, is the so-called artificial currency or scrip mechanisms. These mechanisms have been proposed in multiple isolated application instances since the early 2000’s. [25] propose a “point system” to address the problem of free-riding in peer-to-peer networks, where agents tend to download many more files than they upload. Similar works in the domain of peer-to-peer networks include [58], who specifically call their point system “karma” and focus on its cryptographic implementation rather than the design of the mechanism itself; and [18], who do incorporate elements of mechanism design but focus solely on the choice of a single parameter, which is the total amount of karma in their specific model. In the domain of transportation, [46] recently demonstrated how an artificial currency (also called “karma”) can be utilized instead of monetary tolls to achieve optimal routing in a two arc road network. To the extent of our knowledge, the only concrete example of a real-life implementation of a karma-like concept is the “choice system” for the allocation of food donations to food banks in the USA [43]. There, food banks are allocated “shares” which they use to bid on the food donations they need. It is considered to be a major success as evidenced by the active participation of food banks in the system as well as the unprecedented fluidity of food donations it resulted in.

All of the above works share in common the need for a non-monetary medium of exchange to coordinate the use of shared resources. However, there is an apparent lack of unity in the approaches taken, with most works proposing a problem-tailored, heuristic mechanism with little rigorous justification and scope for generalization. In [43], a model is presented that makes many simplifying assumptions on the strategic behavior of the users. This model does not truly capture the dynamic nature of the optimization problem of the users, who must ration their use of shares now to secure their future needs. In [46], a game-theoretic equilibrium is considered in which individual users solve a finite horizon dynamic optimization, but importantly, the amount of karma saved at the end of the horizon is treated as an exogenous parameter. A few other works that attempt to systematically study artificial currency mechanisms include [26, 33]. [33] studies a setting where a pool of users alternate between requesting and providing services to each other (e.g., a pool of parents exchanging baby-sitting services). This differs from our work in the following fundamental aspects. First, it does not give the users the flexibility to express intensity of preferences through a bidding procedure. Second, the equilibrium notion considered relies on remembering if certain users denied providing service before and punishing those users by never granting them service again. As discussed in Sect. 1.1.1, retaliation schemes are crucially based on the capability of detecting defection, which in our setting cannot be done without knowing the private preference of the agents. [26] provides a general method to convert truthful monetary to non-monetary mechanisms, which relies on a central planner estimating how much money each user would spend in a finitely repeated monetary auction and giving the user a similar amount of artificial currency at the beginning of the horizon. This requires central knowledge of the players’ private preferences and does not capture the important dynamic feedback process of gaining currency through yielding. In contrast to these works, our approach shows that a robust and ultimately efficient behavior emerges only from the dynamic strategic problem faced by the users of the karma mechanism, without additional rules or coordination mechanisms. To the extent of our knowledge, there are no other works that study the strategic behavior in artificial currency mechanisms at this level of generality. We believe that this is fundamental for the understanding of these mechanisms and serves as an important tool for the mechanism design.

1.2 Organization of the Paper

Section 2 introduces the setting of dynamic resource allocation and the concept of karma mechanisms. In Sect. 3, we model karma mechanisms as dynamic population games and show that a stationary Nash equilibrium is guaranteed to exist. Section 4 focuses specifically on how different karma payment and redistribution rules can be incorporated in the model. The model is utilized in a numerical investigation of karma mechanisms in Sect. 5, where we provide insights on the emerging strategic behavior as well as the consequences of the karma mechanism design on the achieved efficiency and fairness of the resource allocation.

1.3 Notation

Let D be a discrete set and C be a continuous set. Let \(a,d \in D\) and \(c \in C\). For a function \(f: D \times C \rightarrow {{\mathbb {R}}}\), we distinguish the discrete and continuous arguments through the notation f[d](c). Alternatively, we write \(f: C \rightarrow {{\mathbb {R}}}^{|D |}\) as the vector-valued function f(c), with f[d](c) denoting its \(d^th \) element. Similarly, \(g[a \mid d](c)\) denotes the conditional probability of a given d and c. Specifically, \(g[d^+ \mid d](c)\) denotes one-step transition probabilities for d. We denote by \(\Delta (D):=\left\{ p \in {{\mathbb {R}}}_+^{|D |} |\sum _{d \in D} p[d] = 1 \right\} \) the set of probability distributions over the elements of D. For a probability distribution \(p \in \Delta (D)\), p[d] denotes the probability of element d. When considering heterogeneous agent types, we denote by \(x_\tau \) a quantity associated to type \(\tau \).

2 Karma Mechanisms for Dynamic Resource Allocation

We consider a population of agents \({{\mathcal {N}}}= \{1,\dots ,N\}\), where the number of agents N is typically large. For example, \({{\mathcal {N}}}\) is the set of ride-hailing platform riders in the metropolitan area of interest.

At discrete global time instants \(t \in {{\mathbb {N}}}\), two random agents from the population (denoted by \({{\mathcal {C}}}[t] \subset {{\mathcal {N}}}\)) compete for a scarce, indivisible resource, such as the closest ride-hailing driver. We are concerned with designing a mechanism that, at each interaction time t, selects one of the two agents in \({{\mathcal {C}}}[t]\) to allocate the resource to (i.e., grant the trip request).

A karma mechanism works as follows. Each agent \(l \in {{\mathcal {N}}}\) in the population is endowed with a non-negative integer counter \(k^l[t] \in {{\mathbb {N}}}\), called karma, which is private to the agent. Moreover, an additional surplus karma counter \(k^s [t] \in {{\mathbb {N}}}\) exists in the system.

At each interaction time t, each agent \(i \in {{\mathcal {C}}}[t]\) involved in the resource competition submits a sealed non-negative integer bid \(b^i[t] \in \{0,\dots ,k^i[t]\}\), which is bounded by the agent’s karma. The outcome of the interaction is determined by a resource allocation rule and a payment rule. The resource allocation rule decides which of the two competing agents is selected to receive the contended resource.

figure a

It is natural to consider a resource allocation rule that allocates the resource to the agent with the highest bid. A tie-breaking rule is needed when both bids coincide, and we use a fair coin toss for this purpose.

The karma payment rule determines the karma payments of the two competing agents.

figure b

Note that the yielding agent makes a non-positive payment (i.e., it receives karma). As a consequence of this payment rule, at each interaction time t, the karma counters are updated as follows:

$$\begin{aligned} k^{i^*}[t+1]&\leftarrow k^{i^*}[t] - p^{i^*}[t]{} & {} \text {(selected agent}\ i^*[t]), \\ k^{-i^*}[t+1]&\leftarrow k^{-i^*}[t] - p^{-i^*}[t]{} & {} \text {(yielding agent}\ -i^*[t]), \\ k^s [t+1]&\leftarrow k^s [t] + p^{i^*}[t]+ p^{-i^*}[t]{} & {} \text {(surplus karma)}. \end{aligned}$$

Examples of karma payment rules are presented in Sect. 2.1. The surplus karma \(k^s [t]\) is meant to keep track of any excess karma payment in the interaction, in case \(p^{i^*}[t]+ p^{-i^*}[t]\ne 0\), such that this excess gets redistributed to the population agents. This redistribution occurs at time instants \(t^r \) in accordance with a karma redistribution rule.

figure c

As a consequence of this redistribution rule, at each redistribution time \(t^r \) the karma counters are updated as follows:

$$\begin{aligned} k^{l}[t^r +1]&\leftarrow k^{l}[t^r ] + r^l[t^r ]{} & {} \text {(every agent}\ l), \\ k^s [t^r +1]&\leftarrow k^s [t^r ] - \sum _l r^l[t^r ]{} & {} \text {(surplus karma)}. \end{aligned}$$

A considerable freedom in the design of the karma payment and redistribution rules is possible. Hereafter, we present some possible examples, and in Sect. 5 we demonstrate how the choice of these rules affects the strategic behavior of the agents and allows the system designer to achieve different resource allocation objectives.

2.1 Examples of Karma Payment Rules

figure d

\(\texttt {PBP}\) is an example of completely peer-to-peer karma payment rules. It has the advantage of not requiring the system-level surplus karma counter \(k^s [t]\) or any system-wide karma redistribution.

figure e

In contrast to \(\texttt {PBP}\), \(\texttt {PBS}\) is an example of a karma payment rule in which surplus karma is generated and needs to be redistributed. We will demonstrate in the numerical analysis in Sect. 5 that such a redistributive scheme can lead to higher levels of efficiency and fairness of the resource allocation.

2.2 Examples of Karma Redistribution Rules

There are a plethora of methods to redistribute the surplus karma to the agents in the population, as the designer has the freedom to decide both the redistribution times \(t^r \) and the redistribution rule. Nevertheless, we assume that redistribution rules are intended to completely redistribute the surplus karma so that most of the karma in the system is held by the agents, i.e., the total karma held by the agents is a preserved quantity. We will formalize this assumption in the analysis in Sect. 4, where we will effectively assume that the surplus is kept at zero (either by not generating surplus or by redistributing it immediately).

One possibility is for the time instants \(t^r \) to be periodic events in which the redistribution occurs (e.g., every day at midnight). If \(k^s [t^r ]\) is not an integer multiple of the number of agents N, then a remainder is left to be redistributed in the next period. Another possibility is for the redistribution to occur asynchronously whenever \(k^s [t]\) exceeds N, so that a unit of karma per agent can be transferred. Finally, a non-uniform (possibly randomized) redistribution is possible.

Beyond these examples, we will not detail here the specifics of the karma redistribution rule. However, it is interesting to notice that redistributions need not to be limited to positive karma values. For example, it is possible to reduce the karma of each agent according to a non-decreasing “tax” function \(0 \le h[k] \le k\), in order to create surplus (which will then be redistributed). We will briefly discuss the consequences of such a design in the numerical analysis in Sect. 5.

3 A Game-Theoretic Model for Karma Mechanisms

In this section, we develop a game-theoretic model to facilitate the analysis of karma mechanisms. The goal of the model is to address the following points:

  1. 1.

    Karma mechanisms induce a strategic scenario in which the agents must strategically choose their karma bids. The model will serve to demonstrate that this strategic scenario is well-posed by showing the existence of a suitable notion of equilibrium (the stationary Nash equilibrium, see Sects. 3.2, 3.3).

  2. 2.

    The model enables a computational tool to compute stationary best-response behavior of the agents. This tool can be used by the agents to derive their optimal bidding strategies.

  3. 3.

    The specifics of the karma mechanism (e.g., the karma payment and redistribution rules) affect the strategic behavior of the agents, which in turn affects population-level design objectives, in non-trivial ways. The model will serve as an important tool to make mechanism design choices, as demonstrated in the numerical analysis in Sect. 5.

The game played under the karma mechanism has a number of complicating features: it is an infinite dynamic game involving a large number of anonymous agents who have private states which depend on their past actions and, in turn, affect their future available actions. For this purpose, we build our model on the class of dynamic population games, following the formalism of [17].

3.1 The Karma Dynamic Population Game

3.1.1 Population Model

We consider that the number of agents N is large such that they approximately form a continuum of mass. This is reasonable to consider for many of our envisioned applications, e.g., the number of ride-hailing riders in a metropolitan area is typically large. We take the point of view of an ego agent playing against the population from which a random anonymous opponent is uniformly drawn in every resource competition instance. Let i be the identity of the ego agent, and \(t^i\) be the time instants at which the ego agent is involved in the resource competition, i.e., \(i \in {{\mathcal {C}}}[t^i]\) for all \(t^i\). These are the only time instants of relevance to the ego agent, and therefore we model its state dynamics as a discrete-time Markov chain, with the discrete update events occurring at the global times \(t^i\). To simplify notation, we will drop the explicit time dependency since the only time instances of interest are a current time of the ego agent \(t^i\) and a next time of the ego agent \(t^{i+} = \min \{t > t^i \mid i \in {{\mathcal {C}}}[t]\}\). We will also drop the superscript i since all quantities belong to the ego agent, unless explicitly stated otherwise. For example, we write k instead of \(k^i[t^i]\) to denote the ego agent’s current karma, and \(k^+\) instead of \(k^i[t^{i+}]\) to denote its next karma.

3.1.2 Agents’ Type and State

Each ego agent has a private static type \(\tau \in {{\mathcal {T}}}= \{\tau _1,\dots ,\tau _{n_\tau }\}\). The distribution of the agents’ types in the population is specified by the parameter \(g \in \Delta ({{\mathcal {T}}})\), with \(g_\tau \) denoting the fraction of agents belonging to type \(\tau \).

Moreover, each ego agent has a private time-varying state x which consists of an urgency state u and the karma k of the agent, i.e.,

$$\begin{aligned} x = [u, k]&\in {{\mathcal {X}}}= {{\mathcal {U}}}\times {{\mathbb {N}}},&u&\in {{\mathcal {U}}}= \{u_1,\dots ,u_{n_u}\},&k&\in {{\mathbb {N}}}. \end{aligned}$$

The urgency state represents a private valuation for the resource and takes one of the values in the discrete and finite set \({{\mathcal {U}}}\). It therefore corresponds to the cost incurred by the agent when they cannot procure the resource. For example, each value in \({{\mathcal {U}}}\) could correspond to different classes of trips and how important it is that the agent secures a ride-hail for those trips. The urgency at consecutive resource competition instances of an ego agent of type \(\tau \) follows an exogenous, irreducible Markov chain process with transition probabilities denoted by

$$\begin{aligned} \phi _\tau [u^+ \mid u]. \end{aligned}$$
(1)

This process allows to model different assumptions on the temporal preferences of the agent. For example, a static urgency process models that the agent has an equal need for the resource at all times. Alternatively, the process can encode that the agent experiences exogenous events of high urgency where its inconvenience for failing to acquire the resource is elevated (e.g., the cost of failing to secure a ride-hailing trip as a function of the trip length or purpose). While it is possible to extend the analysis to the case where the agent’s next urgency is affected by the local outcome of the interaction (e.g., failing to secure a ride-hailing trip today leads to higher urgency tomorrow due to accumulated delay), we insist on considering the urgency to be a function of exogenous events (e.g., whether the agent must catch a flight today). Moreover, we assume that the urgency processes of different agents are statistically independent.

3.1.3 Social State

The joint distribution of the agents’ types and states in the population is given by

$$\begin{aligned} d \in {{\mathcal {D}}}= \left\{ d \in {{\mathbb {R}}}_{+}^{n_\tau \times \infty } |\; \forall \tau \in {{\mathcal {T}}}, \; \sum _{u,k} d_\tau [u,k] = g_\tau \right\} , \end{aligned}$$
(2)

where \(d_\tau [u,k]\) denotes the fraction of agents in type-state \([\tau , u,k]\). The type-state distribution d is a time-varying quantity whose dynamics evolve in terms of the global times rather than the specific time instants of the ego agent. However, we will be looking for conditions where this distribution is stationary and therefore this difference is inconsequential.

The action of the ego agent is a non-negative integer bid which is limited by its karma

$$\begin{aligned} b \in {{\mathcal {B}}}^k:= \{b \in {{\mathbb {N}}}\mid b \le k\}. \end{aligned}$$

The ego agent of type \(\tau \) chooses its bid according to the homogeneous policy of its type

$$\begin{aligned} \pi _\tau : {{\mathcal {X}}}\rightarrow \Delta ({{\mathcal {B}}}^k) = \left\{ \sigma \in {{\mathbb {R}}}_+^{k+1} |\; \sum _b \sigma [b] = 1 \right\} , \end{aligned}$$

which maps its state [uk] to a probability distribution over the bids b. We denote by \(\pi _\tau [b \mid u,k]\) the probability of bidding b when the agent of type \(\tau \) is in state [uk]. The concatenation of the policies of all types \(\pi =(\pi _{\tau _1},\dots ,\pi _{\tau _{n_\tau }})\) is simply referred to as the policy. The set of policies is denoted by \(\Pi \).

The pair \((d,\pi ) \in {{\mathcal {D}}}\times \Pi \) is referred to as the social state,Footnote 2 as it gives a macroscopic description of the distribution of the agents’ states, as well as how they behave.

In order to characterize the Markov decision process that the ego agent faces, we will now turn to define an immediate reward function \({\zeta }_\tau [u,k,b](d,\pi )\) and a state transition function \({\rho }_\tau [u^+,k^+ \mid u,k,b](d,\pi )\). When the ego agent of type \(\tau \) is in state [uk] in the current resource competition instance and it bids b, \({\zeta }_\tau [u,k,b](d,\pi )\) gives its expected immediate reward, and \({\rho }_\tau [u^+,k^+ \mid u,k,b](d,\pi )\) gives the probability that, at its next resource competition, its state is \([u^+,k^+]\). Both the immediate reward and the state transition are functions of the social state \((d,\pi )\).

3.1.4 Immediate Reward Function

Since the ego agent gets matched with a random opponent from the population, an important quantity is the distribution of other agents’ bids, which can be readily derived from the social state as

$$\begin{aligned} {\nu }[b'](d,\pi ) = \sum _{\tau ', u', k'} d_\tau [u',k'] \, \pi _\tau [b' \mid u',k'], \end{aligned}$$
(3)

where \(b'\) (similarly, \(\tau '\), \(u'\), \(k'\)) denotes that these quantities belong to agents other than the ego agent. On a fundamental level, the ego agent is playing a game against this distribution, since it determines the likelihood of being selected to receive the resource for a given bid b, as well as the likelihood of transitioning to the next karma \(k^+\). Let us denote the resource competition outcome to the ego agent by \(o \in {{\mathcal {O}}}= \{0,1\}\), where \(o=0\) means that it is selected and \(o=1\) that it is yielding. Conditional on its bid b and the opposing bid \(b'\), the ego agent has the following probability of being selected

$$\begin{aligned} {{\mathbb {P}}}[o=0 \mid b, b'] = {\left\{ \begin{array}{ll} 1, &{}\text {if }\, b > b', \\ 0, &{}\text {if }\, b < b', \\ 0.5, &{}\text {if }\, b = b', \end{array}\right. } \end{aligned}$$
(4)

which lets us compute the probability of its resource competition outcome given its bid as a function of the social state as

$$\begin{aligned} {\gamma }[o \mid b](d,\pi ) = \sum _{b'} {\nu }[b'](d,\pi ) \, {{\mathbb {P}}}[o \mid b, b']. \end{aligned}$$
(5)

The ego agent incurs a cost equal to its urgency u when it yields the resource \((o=1)\), and zero cost otherwise \((o=0)\). This allows us to define the immediate reward function as

$$\begin{aligned} {\zeta }_\tau [u,k,b](d,\pi ) = {\zeta }[u,b](d,\pi ) = -u \, {\gamma }[o=1 \mid b](d,\pi ), \end{aligned}$$
(6)

which is negated to denote a reward rather than a cost. Note that it only depends on the urgency u and the bid b and not on the type \(\tau \) or the karma k. Note also that it is continuous in the social state \((d,\pi )\).

3.1.5 State Transition Function

The urgency of the ego agent at its next resource competition instance follows the exogenous process \(\phi _\tau [u^+ \mid u]\). In contrast, the ego agent’s next karma depends on multiple factors, including its current bid, the resource competition outcome, the specifics of the karma payment rule, and whether a karma redistribution event occurs before its next resource competition instance (see Fig. 2). We abstract this dependency with the karma transition function

$$\begin{aligned} \kappa [k^+ \mid k,b,o](d,\pi ), \end{aligned}$$
(7)

which lets us express the state transition function as

$$\begin{aligned} {\rho }_\tau [u^+,k^+ \mid u, k, b](d,\pi ) = \phi _\tau [u^+ \mid u] \sum _o {\gamma }[o \mid b](d,\pi ) \, \kappa [k^+ \mid k,b,o](d,\pi ). \end{aligned}$$
(8)
Fig. 2
figure 2

Timeline of the resource competition instances, highlighting the times relevant for modeling an ego agent i’s karma Markov chain. A redistribution event could affect the karma transitions

Section 4 details the specifics of how to derive the karma transition function (7) for the cases of pay bid to peer (\(\texttt {PBP}\)) and pay bid to society (\(\texttt {PBS}\)). Here, we highlight two general properties of (7), (8) that are necessary for the main technical results to follow.

Assumption 1

(Continuity of the state transition function) The state transition function \({\rho }_\tau [u^+,k^+ \mid u, k, b](d,\pi )\) defined in (8) is continuous in \((d,\pi )\).

The continuity assumption is barely restrictive since the dependency on \((d,\pi )\) typically arises in the form of expectations, as is demonstrated in Sect. 4.

Assumption 2

(Karma preservation in expectation) Karma is preserved for all \((d,\pi )\), when taking the expectation over the entire population, i.e.,

$$\begin{aligned}&\mathop {\mathbb {E}}\limits _{\begin{array}{c} [\tau ,u,k] \sim d \\ b \sim \pi _\tau [\cdot \mid u,k] \end{array}}[k^+] = \mathop {\mathbb {E}}\limits _{[\tau ,u,k] \sim d}[k], \end{aligned}$$
(9)

which expands to

$$\begin{aligned}{} & {} \sum _{\tau ,u,k} d_\tau [u,k] \sum _b \pi _\tau [b \mid u,k] \sum _o {\gamma }[o \mid b](d,\pi ) \sum _{k^+} \kappa [k^+ \mid k,b,o](d,\pi ) \, k^+ \\{} & {} \quad = \sum _{\tau ,u,k} d_\tau [u,k] \, k. \end{aligned}$$
(KP)

Intuitively, Assumption 2 requires that the karma held by the agents is preserved, either because no surplus karma is generated or because it is promptly redistributed. This can be guaranteed by appropriate design of the payment rules or it can be achieved under some assumptions on the karma redistribution scheme as is demonstrated in Sect. 4.

3.2 Solution Concept: Stationary Nash Equilibrium

3.2.1 Best Response

We assume that the ego agent of type \(\tau \) discounts its future rewards with the discount factor \(\alpha _\tau \in [0,1)\). Then, the expected immediate reward of the ego agent of type \(\tau \) when it follows the policy \(\pi _\tau \) is

$$\begin{aligned} R_\tau [u,k](d,\pi ) = \sum _b \pi _\tau [b \mid u,k] \, {\zeta }[u,b](d,\pi ), \end{aligned}$$

and its state transition probabilities are

$$\begin{aligned} P_\tau [u^+,k^+ \mid u,k](d,\pi ) = \sum _b \pi _\tau [b \mid u,k] \, {\rho }_\tau [u^+,k^+ \mid u,k,b](d,\pi ). \end{aligned}$$

The expected infinite horizon reward is therefore recursively defined as

$$\begin{aligned} V_\tau [u,k](d,\pi )= & {} R_\tau [u,k](d,\pi ) \nonumber \\{} & {} +~\alpha _\tau \sum _{u^+,k^+} P_\tau [u^+,k^+ \mid u,k](d,\pi ) \, V_\tau [u^+,k^+](d,\pi ). \end{aligned}$$
(10)

Equation (10) is the well-known Bellman recursion for the fixed policy \(\pi _\tau \). We next show that it has a unique solution that is continuous in \((d,\pi )\).

Lemma 1

Let Assumption 1 hold. Then the solution of (10) is unique and continuous in \((d,\pi )\).

Proof

First we show uniqueness. Let \(V_\tau (d,\pi )\) be the vector formed by stacking \(V_\tau [x](d,\pi )\) for all \(x \in {{\mathcal {X}}}\). It is straightforward to show that \(\Vert V_\tau (d,\pi )\Vert _\infty \le \frac{u_max }{1 - \alpha }\), where \(u_max \) is the maximal element of the finite set \({{\mathcal {U}}}\).Footnote 3 Therefore, \(V_\tau (d,\pi )\) lies in the Banach space of bounded sequences \((\ell ^\infty , \Vert \cdot \Vert _\infty )\). For a fixed social state \((d,\pi )\), let \(T_\tau ^{(d,\pi )}: \ell ^\infty \rightarrow \ell ^\infty \) be the map defined by the right-hand side of (10) (in vector form), i.e., \(T_\tau ^{(d,\pi )}(v) = R_\tau (d,\pi ) + \alpha _\tau \, P_\tau (d,\pi ) \, v\). Observe that \(V_\tau (d,\pi )\) is a fixed point of \(T_\tau ^{(d,\pi )}\), which we show to be unique by showing that \(T_\tau ^{(d,\pi )}\) is a contraction mapping, i.e.,

$$\begin{aligned} \Vert T_\tau ^{(d,\pi )}(v) - T_\tau ^{(d,\pi )}(v')\Vert _{\infty }&= \Vert \alpha _\tau \, P_\tau (d,\pi ) \, (v - v')\Vert _{\infty } \nonumber \\&= \alpha _\tau \, \max _x \left|\sum _{x^+} P_\tau [x^+ \mid x](d,\pi ) \, (v[x^+] - v'[x^+]) \right|\nonumber \\&\le \alpha _\tau \, \max _x \sum _{x^+} |P_\tau [x^+ \mid x](d,\pi ) \, (v[x^+] - v'[x^+]) |\nonumber \\&\le \alpha _\tau \left( \max _x \sum _{x^+} P_\tau [x^+ \mid x](d,\pi ) \right) \left( \max _{x^+} |v[x^+] - v'[x^+]|\right) \nonumber \\ {}&= \alpha _\tau \, \Vert v - v'\Vert _{\infty }. \end{aligned}$$
(11)

Since \(\alpha _\tau \in [0,1)\), this proves that \(T_\tau ^{(d,\pi )}\) is contractive.

Consider next the normed space \(({{\mathcal {D}}}\times \Pi , \Vert \cdot \Vert )\) (with an arbitrary norm) and the function \(V_\tau : {{\mathcal {D}}}\times \Pi \rightarrow \ell ^\infty \) defined as the unique fixed point of \(T_\tau ^{(d,\pi )}\). We show that \(V_\tau \) is continuous at every \((d,\pi ) \in {{\mathcal {D}}}\times \Pi \). Fix \(\epsilon > 0\). Choose \(\epsilon ' = (1 - \alpha _\tau ) \, \epsilon \) and \(\delta > 0\) such that \(\Vert (d,\pi ) - (d',\pi ')\Vert< \delta \Rightarrow \Vert T_\tau ^{(d,\pi )}(V_\tau (d,\pi )) - T_\tau ^{(d',\pi ')}(V_\tau (d,\pi ))\Vert _\infty < \epsilon '\). Such a \(\delta \) is guaranteed to exist since \(T_\tau ^{(d,\pi )}(v)\) is continuous in \((d,\pi )\) for any fixed v (\(R_\tau (d,\pi )\) is continuous, and so is \(P_\tau (d,\pi )\) as a consequence of Assumption 1). Then, we have for any \((d',\pi ')\) such that \(\Vert (d,\pi ) - (d',\pi ')\Vert < \delta \),

$$\begin{aligned}&\Vert V_\tau (d,\pi ) - V_\tau (d',\pi ')\Vert _\infty \\&\quad = \Vert V_\tau (d,\pi ) - T_\tau ^{(d',\pi ')}(V(d,\pi )) + T_\tau ^{(d',\pi ')}(V_\tau (d,\pi )) - V_\tau (d',\pi ')\Vert _\infty \\&\quad \le \Vert T_\tau ^{(d,\pi )}(V_\tau (d,\pi )) - T_\tau ^{(d',\pi ')}(V_\tau (d,\pi ))\Vert _\infty \\&\qquad + \Vert T_\tau ^{(d',\pi ')}(V_\tau (d,\pi )) - T_\tau ^{(d',\pi ')}(V_\tau (d',\pi '))\Vert _\infty \\&\qquad < (1 - \alpha _\tau ) \, \epsilon + \alpha _\tau \, \Vert V_\tau (d,\pi ) - V_\tau (d',\pi ')\Vert _\infty , \end{aligned}$$

where the last inequality follows from (11). Manipulating yields \(\Vert V_\tau (d,\pi ) - V_\tau (d',\pi ')\Vert _\infty <\epsilon \), showing continuity. \(\square \)

The ego agent’s single-stage deviation reward (commonly known as the Q-function) is

$$\begin{aligned} Q_\tau [u,k,b](d,\pi )= & {} {\zeta }[u,b](d,\pi ) \nonumber \\{} & {} +~~\alpha _\tau \sum _{u^+,k^+} {\rho }_\tau [u^+,k^+ \mid u,k,b](d,\pi ) \, V_\tau [u^+,k^+](d,\pi ), \end{aligned}$$
(12)

which is the expected infinite horizon reward when the ego agent deviates from the policy \(\pi _\tau \) for a single resource competition instance by bidding b at the state [uk], then follows \(\pi _\tau \) in the future resource competition instances. Consequently, the state-dependent best response correspondence of the ego agent is

$$\begin{aligned}&B_\tau [u,k](d,\pi )\\&\quad \in \left\{ \sigma \in \Delta ({{\mathcal {B}}}^k) |\; \forall \sigma ' \in \Delta ({{\mathcal {B}}}^k), \; \sum \limits _b \left( \sigma [b] - \sigma '[b] \right) Q_\tau [u,k,b](d,\pi ) \ge 0 \right\} . \end{aligned}$$
(BR)

This is the set of probability distributions over the bids maximizing the expected single-stage deviation reward of the ego agent of type \(\tau \) when its state is [uk] and the social state is \((d,\pi )\).

3.2.2 Stationary Nash Equilibrium

We are now ready to define the solution concept that we adopt for this game.

Definition 1

(Stationary Nash equilibrium) A stationary Nash equilibrium is a social state \(({{\varvec{d}}},{\varvec{\pi }}) \in {{\mathcal {D}}}\times \Pi \) which satisfies for all \([\tau ,u,k] \in {{\mathcal {T}}}\times {{\mathcal {U}}}\times {{\mathbb {N}}}\)

$$\begin{aligned} {{\varvec{d}}}_\tau [u,k]&= \sum _{u^-,k^-} {{\varvec{d}}}_\tau [u^-,k^-] \, P_\tau [u,k \mid u^-,k^-]({{\varvec{d}}},{\varvec{\pi }}), \end{aligned}$$
(SNE.1)
$$\begin{aligned} {\varvec{\pi }}_\tau [\cdot \mid u,k]&\in B_\tau [u,k]({{\varvec{d}}},{\varvec{\pi }}). \end{aligned}$$
(SNE.2)

The stationary Nash equilibrium is similar to the classical notion of the Nash equilibrium in that it denotes a state of the game where agents have no incentive to unilaterally deviate from the equilibrium policies of their types \({\varvec{\pi }}_\tau \) (SNE.2), but additionally requires that the type-state distribution \({{\varvec{d}}}\) is stationary under the stochastic processes characterized by the transition probabilities \(P_\tau [u^+,k^+ \mid u,k]({{\varvec{d}}},{\varvec{\pi }})\) (SNE.1). This stationarity condition implies that the ego agent need not consider the dynamics of the type-state distribution \({{\varvec{d}}}\) in its strategic behavior. Moreover, since the number of agents is large, the ego agent cannot unilaterally alter \({{\varvec{d}}}\) to further improve its rewards. Therefore, the equilibrium policies \({\varvec{\pi }}_\tau \) are indeed present and future optimal.

3.3 Existence of Stationary Nash Equilibrium

In [17], it is shown that a stationary Nash equilibrium is guaranteed to exist in every dynamic population game when the state space \({{\mathcal {X}}}\) is finite. We now extend this result to the karma dynamic population game, where the state space is countably infinite due to the karma state \(k \in {{\mathbb {N}}}\). Observe that the set of type-state distributions \({{\mathcal {D}}}\), given in (2), is a convex subset of the Banach space of finitely summable infinite sequences \((\ell ^1,\Vert \cdot \Vert _1)\). This is because the elements \(d \in {{\mathcal {D}}}\) can be represented as the infinite sequence \(\{\sigma [n]\}_{n \in {{\mathbb {N}}}}\) with

$$\begin{aligned}{} & {} (\sigma [0],\dots ,\sigma [n_\tau -1],\sigma [n_\tau ],\dots ,\sigma [n_\tau \,n_u-1],\sigma [n_\tau \,n_u],\dots )\nonumber \\{} & {} \quad =(d_{\tau _1}[u_1,0],\dots ,d_{\tau _{n_\tau }}[u_1,0],d_{\tau _1}[u_2,0],\dots ,d_{\tau _{n_\tau }}[u_{n_u},0],d_{\tau _1}[u_1,1],\dots ). \end{aligned}$$
(13)

Trivially, this sequence is finitely summable, with \(\sum _n |\sigma [n]|= \sum _{\tau ,u,k} d_\tau [u,k] = 1\).

Let us further restrict \({{\mathcal {D}}}\) to the subset of type-state distributions which respect a fixed average amount of karma \({{\bar{k}}}\in {{\mathbb {N}}}\), denoted by:

$$\begin{aligned} {{{\mathcal {D}}}^{{\bar{k}}}}= \left\{ d \in {{\mathcal {D}}}|\; \sum _{\tau , u, k} d_\tau [u,k] \, k= {{\bar{k}}}\right\} . \end{aligned}$$
(14)

This is also a convex subset of \(\ell ^1\). Furthermore, it is compact in \(\ell ^1\), as we show next using the following auxiliary definition and lemma.

Definition 2

(Equismall at infinity, [55] p.451) A subset \(\Sigma \) of \(\ell ^1\) is said to be equismall at infinity if, for every \(\epsilon > 0\), there is an integer \(n_\epsilon \ge 0\) such that

$$\begin{aligned} \sum _{n \ge n_\epsilon } |\sigma [n]|< \epsilon , \quad \text {for all } \sigma \in \Sigma . \end{aligned}$$

Lemma 2

(Compactness in \(\ell ^1\), [55, Theorem 44.2]) The following properties of a subset \(\Sigma \) of \(\ell ^1\) are equivalent:

  1. (a)

    \(\Sigma \) is compact;

  2. (b)

    \(\Sigma \) is bounded, closed, and equismall at infinity.

Corollary 1

\({{{\mathcal {D}}}^{{\bar{k}}}}\) is a compact subset of \(\ell ^1\).

Proof

The set \({{{\mathcal {D}}}^{{\bar{k}}}}\) is trivially closed since it is an intersection of closed polytopes. It is also trivially bounded, since \(0 \le d_\tau [u,k] \le 1\) for all \(d \in {{{\mathcal {D}}}^{{\bar{k}}}}\), \([\tau ,u,k] \in {{\mathcal {T}}}\times {{\mathcal {U}}}\times {{\mathbb {N}}}\). It therefore suffices to show that it is equismall at infinity. For any \(\epsilon > 0\), choose \(k_\epsilon \in {{\mathbb {N}}}\) such that \(\frac{{{\bar{k}}}}{k_\epsilon } < \epsilon \), and \(n_\epsilon = n_\tau \, n_u \, k_\epsilon \). For an arbitrary \(d \in {{{\mathcal {D}}}^{{\bar{k}}}}\), let \(\{\sigma [n]\}_{n \in {{\mathbb {N}}}}\) be its sequence representation as given in (13). We have

$$\begin{aligned}{} & {} {{\bar{k}}}= \sum _{\tau ,u,k} d_\tau [u,k] \, k \ge \sum _{\tau ,u,k \ge k_\epsilon } d_\tau [u,k] \, k \ge k_\epsilon \sum _{\tau ,u,k \ge k_\epsilon } d_\tau [u,k] = k_\epsilon \sum _{n \ge n_\epsilon } |\sigma [n]|\\{} & {} \quad \Leftrightarrow \sum _{n \ge n_\epsilon } |\sigma [n]|\le \frac{{{\bar{k}}}}{k_\epsilon } < \epsilon . \end{aligned}$$

\(\square \)

The compactness of \({{{\mathcal {D}}}^{{\bar{k}}}}\) will enable us to invoke an infinite dimensional version of Kakutani’s fixed point theorem to establish the existence of a stationary Nash equilibrium. Before we do, we need to ensure that the fixed point correspondence maps elements of \({{{\mathcal {D}}}^{{\bar{k}}}}\) into itself. For a fixed policy \(\pi \in \Pi \), define the map \(W^\pi : {{\mathcal {D}}}\rightarrow {{\mathcal {D}}}\) as the concatination of the right-hand side of condition (SNE.1) for all \([\tau ,u,k] \in {{\mathcal {T}}}\times {{\mathcal {U}}}\times {{\mathbb {N}}}\), i.e.,

$$\begin{aligned} W_{\tau }^\pi [u,k](d) = \sum _{u^-,k^-} d_\tau [u^-,k^-] \, P_\tau [u,k \mid u^-,k^-](d,\pi ). \end{aligned}$$
(15)

That \(W^\pi \) maps elements of \({{\mathcal {D}}}\) to itself follows trivially from the fact that \(P_\tau [u^+,k^+ \mid u,k](d,\pi )\) are transition probabilities. We further have the following lemma.

Lemma 3

Let Assumption 2 hold. Then for all \({{\bar{k}}}\in {{\mathbb {N}}}\) and \(\pi \in \Pi \), \(W^\pi \) maps \({{{\mathcal {D}}}^{{\bar{k}}}}\) into itself.

Proof

For a \(d \in {{{\mathcal {D}}}^{{\bar{k}}}}\), the average amount of karma of \(W_{\tau }^\pi (d)\) is

$$\begin{aligned}&\sum _{\tau ,u^+,k^+} W_{\tau }^\pi [u^+,k^+](d) \, k^+ \\&\quad = \sum _{\tau ,u,k} d_\tau [u,k] \sum _{u^+,k^+} P_\tau [u^+,k^+ \mid u,k](d,\pi ) \, k^+ \\&\quad = \sum _{\tau ,u,k} d_\tau [u,k] \sum _b \pi _\tau [b \mid u, k] \sum _o {\gamma }[o \mid b](d,\pi ) \sum _{k^+} \kappa [k^+ \mid k, b, o](d,\pi ) \, k^+ \sum _{u^+} \phi _\tau [u^+ \mid u] \\&\quad = \sum _{\tau ,u,k} d_\tau [u,k] \sum _b \pi _\tau [b \mid u, k] \sum _o {\gamma }[o \mid b](d,\pi ) \sum _{k^+} \kappa [k^+ \mid k, b, o](d,\pi ) \, k^+ \\&\quad = \sum _{\tau ,u,k} d_\tau [u,k] \, k = {{\bar{k}}}, \end{aligned}$$

where we used the non-negativity of the summands to exchange the order of the infinite sums [45], and condition (KP). Therefore, \(W^\pi (d) \in {{{\mathcal {D}}}^{{\bar{k}}}}\). \(\square \)

We are now ready to apply the following infinite dimensional fixed point theorem to establish our main technical result: the existence of a stationary Nash equilibrium in karma dynamic population games (Theorem 1).

Lemma 4

(Kakutani–Glicksberg–Fan fixed point theorem, [27, Theorem 8.6]) Let C be a compact convex subset of a locally convex Hausdorff space E, and let \(S: C \rightarrow 2^C\) be a set-valued correspondence which is upper hemicontinuous, nonempty, compact and convex. Then S has a fixed point.

Theorem 1

(Existence of a stationary Nash equilibrium in karma dynamic population games) Let Assumption 1 and 2 hold. Then for each \({{\bar{k}}}\in {{\mathbb {N}}}\), a stationary Nash equilibrium \(({{\varvec{d}}},{\varvec{\pi }})\) satisfying \({{\varvec{d}}}\in {{{\mathcal {D}}}^{{\bar{k}}}}\) is guaranteed to exist.

Proof

We can write the stationary Nash equilibrium conditions (SNE.1), (SNE.2) as the fixed points of the correspondence defined as

$$\begin{aligned} S(d,\pi ) = \left( W^\pi (d), \{B_\tau [u,k](d,\pi )\}_{[\tau ,u,k]}\right) , \end{aligned}$$

where \(\{B_\tau [u,k](d,\pi )\}_{[\tau ,u,k]}\) is the sequence of best responses at all type-states \([\tau ,u,k] \in {{\mathcal {T}}}\times {{\mathcal {U}}}\times {{\mathbb {N}}}\).

  • The set \(C = {{{\mathcal {D}}}^{{\bar{k}}}}\times \prod \limits _{\tau ,u,k} \Delta ({{\mathcal {B}}}^k)\) is a compact subset of the locally convex Hausdorff space \(E = \ell ^1 \times \prod \limits _{\tau ,u,k} {{\mathbb {R}}}^{k+1}\), by Corollary 1 and Tychonoff’s theorem [56]. C is also trivially convex.

  • S maps C into subsets of C, by Lemma 3 and the definition of the best response.

  • S is upper hemicontinuous and nonempty, by the continuity of \(P_\tau (d,\pi )\) and \(Q_\tau [u,k,b] (d,\pi )\) in \((d,\pi )\), and Berge’s maximum theorem [2].

  • S is compact and convex, since \(W^\pi (d)\) is a singleton and \(B_\tau [u,k](d,\pi )\) is the set of convex mixtures over the finite number of bids maximizing \(Q_\tau [u,k,b](d,\pi )\).

It follows from Lemma 4 that S is guaranteed to have a fixed point, which coincides with a stationary Nash equilibrium. Since \({{\bar{k}}}\) above was arbitrary, this holds for each \({{\bar{k}}}\in {{\mathbb {N}}}\). \(\square \)

The significance of Theorem 1 lies in establishing that karma mechanisms induce a well-posed game in which a rational behavior exists and is well defined. Consequently, one can rigorously study the long-term social welfare implications of karma mechanisms at the stationary Nash equilibrium. The uniqueness of the stationary Nash equilibrium, as well as whether different learning dynamics are guaranteed to converge to it, remain open research questions.

3.4 Discussion of Incentive Compatibility

We now turn to discuss the classical notion of incentive compatibility (also known as strategy-proofness or truthfulness) in the context of karma mechanisms, in particular with respect to the bidding and resource allocation that happens at every interaction between the agents. Following [35, 38], we say that an auction-like mechanism is incentive compatible if the optimal (selfish) action by each agent is to bid their own truthful evaluation of the contended resource, thus revealing their private preference. Notice that, unlike classical monetary instruments, karma does not possess any intrinsic value, as it has no use outside of the game. It does however acquire value as a means of exchange in the game, and one could attempt to define a notion of incentive compatiblity with respect to the value of karma given by the expected infinite horizon reward in (10). However, we argue that such a notion of incentive compatibility is not critical for the efficiency of the resource allocation (i.e., the allocation of the resource to the highest urgency agent) at the stationary Nash equilibrium of the karma mechanism. First, this is supported by the numerical analysis in Sect. 5, where near-optimal efficiency is robustly observed for a wide range of settings under all of the karma mechanism designs considered. Second, unlike the classical monetary setting, incentive compatibility with respect to the value of karma does not guarantee efficiency. This is because the value of karma depends on the contingent private state of the agent (the immediate urgency, but also the current karma balance and how urgent they expect to be in the future). For this reason, even if the karma mechanism was incentive compatible, a truthful bid would not be a perfect revelation of the agent’s urgency. It is important to highlight that in this work, we develop the tools to predict the strategic behavior under general karma mechanisms. Therefore, we are able to robustly assess the resource allocation efficiency of these mechanisms when agents bid optimally according to their own self-interest, without the need for incentive compatibility as an intermediate step.

This is not to say that incentive compatibility is not a desirable property of the karma mechanism. It will assist in the process of learning optimal policies by providing optimal feedback when agents bid truthfully with respect to their value of karma (which also needs to be learnt; it corresponds to the value function that solves the Bellman equation given in (10)). Moreover, it will likely lead to robustness against uncertain information about the social state. The precise effect of the karma mechanism design on the learning process of the agents remains an exciting open research question.

4 Modeling of Karma Payment and Redistribution Rules

We now revisit the karma payment and redistribution rules introduced in Sect. 2, show how the karma transition function (7) in Sect. 3.1.5 can be specialized to model them, and verify that they satisfy Assumption 1 and 2.

A key difference when it comes to deriving the karma transition function is whether there is no surplus karma generated by the payment rule, such as in pay bid to peer (\(\texttt {PBP}\)), or whether surplus karma is generated, such as in pay bid to society (\(\texttt {PBS}\)). In the case of no surplus karma, the ego agent’s karma at the next resource allocation instance is fully determined by the outcome of the current instance, making the karma transition probabilities easier to model. In the case in which surplus is generated, then redistribution needs to occur, which in full generality might or might not happen between successive interactions of the ego agent (see Fig. 2). Extra care must be taken in order to guarantee that Assumption 2 holds, and we will do so by introducing additional modeling assumptions that guarantee that the surplus karma is entirely redistributed between successive interactions of the ego agent.

4.1 Payment Rules with no Surplus Karma

In the pay bid to peer (\(\texttt {PBP}\)) karma payment rule, the ego agent pays its bid if it is selected, and otherwise it gets paid the opposing bid \(b'\). Consequently, the conditional probability of its next karma is

$$\begin{aligned} {{\mathbb {P}}}[k^+ \mid k,b,b',o] = {\left\{ \begin{array}{ll} 1, &{}\text {if }\, o = 0 \text { and } k^+ = k - b, \\ 1, &{}\text {if }\, o = 1 \text { and } k^+ = k + b', \\ 0, &{}\text {otherwise}, \end{array}\right. } \end{aligned}$$

which leads to the following karma transition function

$$\begin{aligned} \kappa ^\texttt {PBP}[k^+ \mid k,b,o](d,\pi ) = \frac{\sum \nolimits _{b'} {\nu }[b'](d,\pi ) \, {{\mathbb {P}}}[o \mid b, b'] \, {{\mathbb {P}}}[k^+ \mid k,b,b',o]}{{\gamma }[o \mid b](d,\pi )}. \end{aligned}$$
(16)

It is straightforward to verify that (16) satisfies the continuity assumption (Assumption 1). Karma preservation in expectation (Assumption 2) is also satisfied for all \((d,\pi )\) (that we omit from the notation), as

$$\begin{aligned}&\sum _{\tau ,u,k} d_\tau [u,k] \sum _b \pi _\tau [b \mid u,k] \sum _o {\gamma }[o \mid b] \sum _{k^+} \kappa ^\texttt {PBP}[k^+ \mid k,b,o] \, k^+ \\&\quad = \sum _{\tau ,u,k} d_\tau [u,k] \sum _b \pi _\tau [b \mid u,k] \sum _{b'} {\nu }[b'] \sum _o {{\mathbb {P}}}[o \mid b,b'] \sum _{k^+} {{\mathbb {P}}}[k^+ \mid k,b,b',o] \, k^+ \\&\quad = \sum _{\tau ,u,k} d_\tau [u,k] \sum _b \pi _\tau [b \mid u,k] \\&\qquad \quad \sum _{b'} {\nu }[b'] \left( {{\mathbb {P}}}[o=0 \mid b,b'] \, (k - b) + {{\mathbb {P}}}[o=1 \mid b,b'] \, (k + b')\right) \\&\quad = \sum _{\tau ,u,k} d_\tau [u,k] \, k - \sum _{b',b>b'} {\nu }[b] \, {\nu }[b'] \, b + \sum _{b,b'>b} {\nu }[b] \, {\nu }[b'] \, b' \\&\quad = \sum _{\tau ,u,k} d_\tau [u,k] \, k. \end{aligned}$$

4.2 Payment Rules with Surplus Karma

We make the following assumption to ease the modeling of payment rules which generate surplus karma to be redistributed to all the agents.

Assumption 3

(Synchronous matching and redistribution) At every time instant t, the whole population is randomly matched in simultaneous pairwise resource competition instances, and all surplus karma is redistributed immediately.

Under the pay bid to society (\(\texttt {PBS}\)) karma payment rule, the ego agent pays its bid if it is selected, and pays nothing otherwise. Its conditional payment is hence given by

$$\begin{aligned} p^\texttt {PBS}[b,o] = {\left\{ \begin{array}{ll} b, &{}\text {if }\, o = 0, \\ 0, &{}\text {otherwise}. \end{array}\right. } \end{aligned}$$

Due to Assumption 3, the average generated surplus can be computed by letting the ego agent assume the role of all the agents in the population, whose type-states are distributed as per d and who follow the policies \(\pi _\tau \)

$$\begin{aligned} {\bar{p}}^\texttt {PBS}(d,\pi )&= \sum _{\tau ,u,k} d_\tau [u,k] \sum _b \pi _\tau [b \mid u,k] \sum _o {\gamma }[o \mid b](d,\pi ) \, p^\texttt {PBS}[b,o] \\&=\sum _{\tau ,u,k} d_\tau [u,k] \sum _b \pi _\tau [b \mid u,k] \, {\gamma }[o=0 \mid b](d,\pi ) \, b. \end{aligned}$$

This gets redistributed to all the agents using the following integer-preserving redistribution rule (although other redistribution rules could be employed, as long as they redistribute the entire surplus):

  • distribute \(\left\lfloor p^\texttt {PBS}(d,\pi ) \right\rfloor \) to a fraction \(f^low (d,\pi ):=\Bigg \lceil {\bar{p}}^\texttt {PBS}(d,\pi ) \Bigg \rceil - {\bar{p}}^\texttt {PBS}(d,\pi )\) of agents, randomly selected;

  • distribute \(\Bigg \lceil {\bar{p}}^\texttt {PBS}(d,\pi ) \Bigg \rceil \) to the remaining fraction \(f^high (d,\pi ):= 1 - f^low (d,\pi )\) of agents.

Consequently, the probability that the ego agent receives a surplus payment of \(\left\lfloor {\bar{p}}^\texttt {PBS}(d,\pi ) \right\rfloor \) (respectively, \(\Bigg \lceil {\bar{p}}^\texttt {PBS}(d,\pi ) \Bigg \rceil \)) is \(f^low (d,\pi )\) (respectively, \(f^high (d,\pi )\)), resulting in the following karma transition function

$$\begin{aligned}{} & {} \kappa ^\texttt {PBS}[k^+ \mid k,b,o](d,\pi ) \nonumber \\{} & {} \quad = {\left\{ \begin{array}{ll} f^low (d,\pi ), &{}\text {if }\, o = 0 \text { and } k^+ = k - b + \left\lfloor {\bar{p}}^\texttt {PBS}(d,\pi ) \right\rfloor , \\ f^high (d,\pi ), &{}\text {if }\, o = 0 \text { and } k^+ = k - b + \Bigg \lceil {\bar{p}}^\texttt {PBS}(d,\pi ) \Bigg \rceil , \\ f^low (d,\pi ), &{}\text {if }\, o = 1 \text { and } k^+ = k + \left\lfloor {\bar{p}}^\texttt {PBS}(d,\pi ) \right\rfloor , \\ f^high (d,\pi ), &{}\text {if }\, o = 1 \text { and } k^+ = k + \Bigg \lceil {\bar{p}}^\texttt {PBS}(d,\pi ) \Bigg \rceil , \\ 0, &{}\text {otherwise}. \end{array}\right. } \end{aligned}$$
(17)

It is straightforward to verify that (17) satisfies the continuity assumption (Assumption 1). Karma preservation in expectation (Assumption 2) is also satisfied, as

$$\begin{aligned}&\sum _{\tau ,u,k} d_\tau [u,k] \sum _b \pi _\tau [b \mid u,k] \sum _o {\gamma }[o \mid b] \, \sum _{k^+} \kappa ^\texttt {PBS}[k^+ \mid k,b,o] \, k^+ \\&\quad = \sum _{\tau ,u,k} d_\tau [u,k] \sum _b \pi _\tau [b \mid u,k] \left( {\gamma }[o=0 \mid b] \, (k - b + {\bar{p}}^\texttt {PBS}) + {\gamma }[o=1 \mid b] \, (k + {\bar{p}}^\texttt {PBS}) \right) \\&\quad = \sum _{\tau ,u,k} d_\tau [u,k] \, k + {\bar{p}}^\texttt {PBS} - \sum _{\tau ,u,k} d_\tau [u,k] \sum _b \pi _\tau [b \mid u,k] \, {\gamma }[o=0 \mid b] \, b \\&\quad = \sum _{\tau ,u,k} d_\tau [u,k] \, k, \end{aligned}$$

where we use \(f^low \, \left\lfloor {\bar{p}}^\texttt {PBS} \right\rfloor + f^high \, \Bigg \lceil {\bar{p}}^\texttt {PBS} \Bigg \rceil = {\bar{p}}^\texttt {PBS}\).

5 Numerical Analysis

In this section, we perform a numerical analysis of karma mechanisms, providing insights on the strategic behaviors that emerge at the stationary Nash equilibrium, and their consequences on the social welfare. We first define the social welfare measures in Sect. 5.1, then analyze the performance of the mechanisms in a demonstrative case study in Sect. 5.2. Finally, we test the robustness of the mechanisms to heterogeneity of the agents in Sects. 5.3 and 5.4.

As detailed in Appendix A, all the stationary Nash equilibria presented were computed using a dynamic equilibrium-seeking algorithm that is inspired by evolutionary dynamics in population games [17, 47].

5.1 Social Welfare Measures and Benchmark Resource Allocation Schemes

In order to quantitatively assess the performance of karma mechanisms, we introduce the following social welfare measures, along with benchmark resource allocation schemes that optimize them. As a baseline, we take a resource allocation scheme that simply allocates the resource in every competition instance based on a fair coin toss. We denote this scheme by \(\texttt {COIN}\).

5.1.1 Efficiency

We define efficiency as

$$\begin{aligned} \text {eff} = \lim _{T \rightarrow \infty } \frac{1}{T} \mathop {\mathbb {E}}\limits \left[ \sum _{t = 0}^{T-1} \sum _{i \in {{\mathcal {C}}}[t]} \frac{{\zeta }^i[t]}{2}\right] , \end{aligned}$$
(18)

which is the expected average reward of the two agents involved in the infinitely repeated resource competition instances. At the stationary Nash equilibrium \(({{\varvec{d}}},{\varvec{\pi }})\) of the continuous population model, (18) evaluates to

$$\begin{aligned} \text {eff}({{\varvec{d}}},{\varvec{\pi }}) = \sum _{\tau ,u,k} {{\varvec{d}}}_\tau [u,k] \, R_\tau [u,k]({{\varvec{d}}},{\varvec{\pi }}), \end{aligned}$$

which is the expected average reward per resource competition instance of an ego agent assuming the role of all the agents (leveraging the stationarity of \({{\varvec{d}}}\)).

A benchmark resource allocation scheme which maximizes the efficiency is known as the omniscient benevolent dictator, who has access to the agents’ private urgency and allocates the resource to the agent with highest one. We denote this scheme by \(\texttt {DICT}\).

5.1.2 Ex-post Access Fairness and Ex-post Reward Fairness

In line with the literature on randomized resource allocations (e.g., [11]), the following ex-post fairness measures are defined for finite time horizons T and particular realizations of the repeated resource allocations.Footnote 4

Let \(w^i_T\) be the fraction of times agent i was selected to receive the resource (with respect to the times it was involved in resource competitions). The ex-post access fairness is defined via the standard deviation of \(w^i_T\) with respect to the different agents, i.e.,

$$\begin{aligned} {\text {af}}_T = -\mathop {\text {std}}\limits _{l \in {{\mathcal {N}}}} w^l_T, \quad w^i_T = \frac{1}{T} \sum _{s=0}^{T-1} \left[ i = i^*[t^i_s]\right] . \end{aligned}$$

A benchmark dynamic resource allocation scheme which aims to maximize the ex-post access fairness is one that ensures that the agents take turns accessing the resource, by selecting the agent who has received the resource the least fraction of times in the past. We denote this scheme by \(\texttt {TURN}\).

Let instead \({\bar{{\zeta }}}^i_T\) be an agent’s mean reward. Then the ex-post reward fairness is defined as the standard deviation of \({\bar{{\zeta }}}^i_T\) with respect to the different agents, i.e.,

$$\begin{aligned} {\text {rf}}_T = -\mathop {\text {std}}\limits _{l \in {{\mathcal {N}}}} {\bar{{\zeta }}}^l_T, \quad {\bar{{\zeta }}}^i_T = \frac{1}{T} \sum _{s=0}^{T-1} \zeta ^i[t^i_s]. \end{aligned}$$

Notice that, in contrast to the ex-post access fairness, ex-post reward fairness cannot be evaluated without knowing the private urgency of the agents.

5.2 Case Study: Homogeneous Agents with Rare High-Urgency State

We showcase our results in a scenario where the agents are homogeneous, i.e., they all follow the same urgency process \(\phi \) and have the same future discount factor \(\alpha \) (and there is only one type \(\tau \), which we drop from the notation in this section). We will investigate the role of heterogeneity in karma mechanisms in Sects. 5.3, 5.4. The agents are typically lowly urgent (\(u=1\)), and have a rare occurrence of being highly urgent (\(u=10\)). The agents can anticipate when they will be highly urgent ahead of time. This is represented by the following urgency process:

figure f

Notice that there are two low urgency states; the first is the ‘default’ state in which the agents find themselves most of the times, and the second is an ‘intermediate’ state which has a high probability of transitioning to the high urgency state.

For example, \(u=1\) default represents a regular day, \(u=1\) intermediate a regular day where the agent anticipates it must go to the airport during rush hour tomorrow, and \(u=10\) the day of that important trip.

Fig. 3
figure 3

Stationary Nash equilibrium with urgency process (19), karma rule \(\texttt {PBS}\), future discount factor \(\alpha =0.98\)

Figure 3 shows the stationary Nash equilibrium computed for the case when the karma payment rule is pay bid to society (PBS) and the agents discount their future rewards with \(\alpha =0.98\). The average amount of karma per agent is \({{\bar{k}}}=10\). The top of the figure shows the equilibrium bidding policy \({\varvec{\pi }}\) at each urgency state, where for a given level of karma (x-axis) the intensity of the red color denotes the probabilistic weight placed on the bids (y-axis), and disallowed bids that exceed the karma budget are displayed gray. The bottom of the figure shows the stationary joint urgency-karma distribution \({{\varvec{d}}}\). The stationary Nash equilibrium exhibits multiple intuitive behaviors. First, agents bid parsimoniously, in order to save karma for the future rather than maximize their immediate chances of success. Second, agents bid more in the high urgency state than in the low urgency states, thereby effectively signalling their urgency. Interestingly, the agents bid zero when they are low on karma in the intermediate low urgency state, in order to gather karma for the anticipated high urgency state. As a consequence, high urgency agents typically have more karma.

Figure 4 shows the performance of the karma mechanisms with respect to the social welfare measures of efficiency, ex-post access fairness and ex-post reward fairness, as a function of the agents’ future discount factor \(\alpha \). In generating this figure, for each value of \(\alpha \), we ran agent-based simulations with \(N=200\) agents who were randomly matched in a total of \(T=1000\) interactions per agent and bid according to the stationary Nash equilibrium policy for \(\alpha \). Each simulation was repeated 10 times in order to construct the displayed confidence intervals. Efficiency and ex-post access fairness are plotted jointly as a trade-off chart on the left side of the figure, and the ex-post reward fairness is plotted on the right. We compare the performance under the karma payment rules \(\texttt {PBP}\) and \(\texttt {PBS}\) and the benchmark resource allocation schemes introduced in Sect. 5.1. As expected, the best efficiency is achieved by \(\texttt {DICT}\), the best ex-post access fairness by \(\texttt {TURN}\), and the baseline \(\texttt {COIN}\) performs poorly in all measures. Interestingly, the performance of \(\texttt {PBP}\) coincides with \(\texttt {COIN}\) when the agents are fully myopic (\(\alpha =0\)). In this case, the equilibrium policy can be computed in closed formFootnote 5; it is a dominant strategy for the agents to bid all their karma since there is no sense in saving it for the future. Under \(\texttt {PBP}\), this leads to all the karma in the system being in the possession of one single agent at a time, rendering an essentially random allocation among all the other agents who have no karma. In contrast, this does not occur under \(\texttt {PBS}\) due to the karma redistribution, and while the bidding all behavior is not efficient also under this payment rule, it preserves some of the turn-taking capability of the karma. In fact, the performance of \(\texttt {PBS}\) dominates that of \(\texttt {PBP}\) across all values of \(\alpha \) and for all of the social welfare measures considered, highlighting the advantage of incorporating a redistributive scheme rather than a strictly peer to peer scheme. This advantage comes at a price, since redistributive schemes such as \(\texttt {PBS}\) requires some degree of centralization to keep track of and redistribute the surplus karma. In many cases, however, it is natural to consider that the agents have a reasonably high value of \(\alpha \), since they are expected to remain in the system for long. Interestingly, both \(\texttt {PBP}\) and \(\texttt {PBS}\) perform similarly well in these cases, exposing that the performance of the karma mechanisms is robust to the specifics of the mechanism design in many reasonable scenarios. Remarkably, both payment rules approach the optimal efficiency of \(\texttt {DICT}\), without ever having to access the agents’ private urgency. At the same time, they vastly outperform \(\texttt {DICT}\) both in terms of ex-post access efficiency and ex-post reward fairness, demonstrating that the karma mechanisms are successful in both achieving fair turn-taking, as well as catering to the agents’ varying temporal needs. This occurs as long as the agents do have some future discounting. A severe degradation in the ex-post fairness occurs in the “pathological” case when the agents do not discount their future (\(\alpha =1\)). This interesting case is discussed separately in Appendix B as it requires different analysis tools to those presented in Sect. 3.

Fig. 4
figure 4

Performance of \(\texttt {PBP}\) and \(\texttt {PBS}\) karma payment rules when there is a rare high urgency event, as a function of the future discount factor \(\alpha \)

Fig. 5
figure 5

Comparison of stationary Nash equilibria under karma payment rules \(\texttt {PBS}\) and \(\texttt {PBP}\) for the low future discount factor \(\alpha =0.7\) (left), and under \(\texttt {PBP}\) for multiple future discount factors (right)

To provide insight into why \(\texttt {PBS}\) outperforms \(\texttt {PBP}\) for low values of the future discount factor \(\alpha \), as well as why the performance of the karma mechanisms improves with increasing \(\alpha \), we compare a number of stationary Nash equilibria in Fig. 5. Here, we compactly represent the equilibrium bidding policies through the mean bids, and only present the results for the default low urgency and the high urgency states (omitting the intermediate low-urgency state). The first two columns of the figure compare the stationary Nash equilibria under \(\texttt {PBS}\) and \(\texttt {PBP}\) for the relatively low value of \(\alpha =0.7\). Observe that under \(\texttt {PBP}\), a significant mass of agents are expected to be low on karma when they are highly urgent, and therefore fail to signal their high urgency against a lowly urgent opponent, contributing to the loss of efficiency observed in Fig. 4. In contrast, under \(\texttt {PBS}\) the mass of the stationary karma distribution at the high urgency state is concentrated in a region where the agents will effectively outbid lowly urgent opponents most of the times, explaining the superior performance of \(\texttt {PBS}\) in terms of efficiency. This occurs due to the redistribution of karma, which ensures that the agents are sufficiently far from having critically low karma.

A similar mechanism is responsible for the improved efficiency at higher values of the future discount factor \(\alpha \), as the rightmost three columns of Fig. 5 demonstrate by contrasting the stationary Nash equilibria under \(\texttt {PBP}\) for \(\alpha \in \{0.7,0.95,0.99\}\) (qualitatively similar results hold for \(\texttt {PBS}\)). Instead of relying on karma redistribution, highly future aware agents learn to be sparing in the use of karma, in order to avoid the situation of being highly urgent and low on karma, in which karma loses its effectiveness as a signaling device. This precisely exemplifies how repetition can be leveraged to align the agents’ incentives, and suggests that karma is an effective instrument for this purpose. Additionally, the mass of agents that have critically low karma values is generally much smaller for \(\alpha =0.99\) than for \(\alpha =0.7\) (at all urgency states), which contributes to the improved ex-post access fairness.

5.3 Robustness to Heterogeneous Future Discount Factors

Fig. 6
figure 6

Robustness to heterogeneous future discount factors, without (top) and with (bottom) karma tax

In this section, we consider a mixed population where the agents can have one of two future awareness types; half of the agents heavily discount the future reward (\(\alpha _1 = 0.7\)) while the other half are strongly future-aware (\(\alpha _2 = 0.99\)). This heterogeneity can have one of two interpretations. In the first interpretation, the future discount factors are true representatives of the agents’ objectives, e.g., the \(\alpha _1\) agents are expecting to exit the system sooner than the \(\alpha _2\) agents. In the second interpretation, all the agents are expected to remain in the system for the same (infinite) time, and the heterogeneity represents differences in their strategic competence, i.e., the \(\alpha _1\) agents are less patient than the \(\alpha _2\) agents. This is the interpretation we focus on here. We would like to investigate the extent to which karma mechanisms are gracious with respect to this difference.

Figure 6 shows stationary Nash equilibrium results for \(\texttt {PBS}\) (top) and \(\texttt {PBS}+\texttt {TAX}\) (bottom), where in the latter we collect a small progressive karma tax of the form \(h[k] = 0.005 \, k^2\) from all the agents and redistribute it uniformly. The mathematical modeling of the karma tax follows similar principles as the redistributive payment rule \(\texttt {PBS}\) (see Sect. 4.2). The defining feature of Fig. 6 is that in the untaxed case, a slight ‘under-bidding’ behavior of the \(\alpha _2\) agents leads them to accumulate significantly more karma than the \(\alpha _1\) agents on the long run. As the ex-ante access fairness and ex-ante reward fairness plots demonstrateFootnote 6, this leads to some degree of unfairness between the two types, with the \(\alpha _2\) agents getting access to the resource a higher fraction of times, as well as experiencing higher average rewards (although the disparity is reasonably small). Nonetheless, applying a karma tax is an effective measure to equalize this disparity, since it both disincentivizes the \(\alpha _2\) agents to hold on to too much karma, and also redistributes some of that karma to the \(\alpha _1\) agents. This serves as a demonstration of the freedom that karma mechanisms give to the system designer, who has a principled tool to achieve different resource allocation objectives.

5.4 Robustness to Heterogeneous Urgency Processes

Thus far in our numerical analysis we have considered that all agents have the same urgency process. The homogeneity of the urgency process facilitates the interpersonal comparability [44] of utility between agents, and ultimately it allows to define a simple notion of efficiency. It is important to notice, however, that each agent’s urgency process is completely private and its only purpose is to encode the user temporal preference for when they would prefer to acquire the resource. It enables to compare the value of the resource for the same agent at different times, more than enabling the comparison between different agents.

We have looked at the effect of introducing some invadersFootnote 7 in the population. We assumed that these invaders have a different urgency process in that they are in a high-urgency state more often than the nominal population. We have considered the case of a small and a large subpopulation of invaders.

The numerical results are reported in Table 1. It is evident that agents that present a higher frequency of the high-urgency state are not granted additional resources under the \(\texttt {PBS}\) karma mechanism. On the contrary, the karma mechanism incentivizes agents to identify their most urgent instances parsimoniously. In contrast, the benchmark strategy \(\texttt {DICT}\) allocates additional resources to the high-urgency invading subpopulation. This behavior illustrates what is a key feature of the proposed “self-contained” karma mechanism, that differentiates it from monetary schemes: fairness of the resource allocation emerges intrinsically from the mechanism and is not affected by exogenous factors of inequality between agents. Uneven allocation of the resource is possible if desired, but it requires deliberate design choices such as non-uniform karma redistribution rules.

Table 1 Probability of acquiring the resource in the case of agents with heterogeneous urgency processes

6 Conclusion

We have demonstrated the effectiveness of karma mechanisms for the dynamic allocation of common resources. These mechanisms make it possible to achieve highly efficient and fair allocations when the resources are repeatedly disputed, without requiring access to the users’ private preferences, and without resorting to monetary pricing, which is problematic in many important domains. The efficiency and fairness of karma mechanisms is robustly observed in multiple numerical cases involving different mechanism designs, preference structures, and user heterogeneity.

We show that it is possible to rigorously study the strategic behavior of the users of a karma mechanism by modeling it as a dynamic population game, in which a stationary Nash equilibrium is guaranteed to exist. We numerically investigate the karma stationary Nash equilibrium, providing insights on the strategic behaviors that emerge and on their consequences for the social welfare. We also provide examples of how our model can be a versatile mechanism design tool for the system designer that wants to affect these behaviors and achieve different resource allocation objectives.

Future work includes applying karma mechanisms in the specific motivating use cases, which include the allocation of ride-hailing trips, autonomous intersection management, as well as traffic and/or internet congestion management. We believe that many more applications are possible. We would also like to investigate the surprisingly understudied notions of fairness in (infinitely) repeated resource allocations, and develop axiomatic principles and specifications to guide the design of karma mechanisms. Moreover, our analysis suggests that karma mechanisms are robust to some types of user heterogeneity, but a comprehensive analysis of the practical effects of more forms of heterogeneity is desirable for some applications. Finally, we remark that effective strategic play by the agents is how karma acquires value in a karma mechanism. An important open research question is how the users of a karma mechanism can learn their optimal bidding strategy from repeated play in a distributed fashion.