1 Introduction

For kidney patients, kidney transplantation is still the most effective treatment resulting in a significant longer life expectancy compared to dialysis. However, the demand for available kidneys has consistently exceeded supply. Moreover, a kidney transplantation might not be possible due to blood-type or tissue-type incompatibilities between a patient and a willing donor. The solution is to place all patient–donor pairs in one pool such that donors can be swapped in a cyclic manner. More formally, an \(\ell\)-way exchange involves \(\ell\) distinct patient–donor pairs \((p_1,d_1),\ldots , (p_\ell ,d_\ell )\), where for \(i\in \{1,\ldots ,\ell -1\}\), donor \(d_i\) donates to patient \(p_{i+1}\) and donor \(d_{\ell }\) donates to patient \(p_1\). A kidney exchange programme (KEP) is a centralized program where the goal is to find an optimal kidney exchange scheme in a pool of patient–donor pairs subject to an upper bound \(\ell\) on the cycle length.

Recently, national KEPs started to collaborate, leading to a number of international KEPs (IKEPs). In 2016, the first international kidney exchange took place, between Austria and the Czech Republic [22]. In 2018, Italy, Spain and Portugal started to collaborate [48]. In 2019, Scandiatransplant, an organization for sharing deceased organs among six Scandinavian countries, started an IKEP involving Swedish and Danish transplant centers. Even though overall solutions will improve, individual rationality might not be guaranteed, that is, individual countries could be worse off. To improve the stability of an IKEP, the following question is therefore highly relevant:

What kind of fairness must we guarantee to ensure that all countries in an IKEP place all their patient–donor pairs in an international pool?

Individual rationality [5, 6] and fairness versus optimality [4, 20, 32, 46] were initially studied for national KEPs, in particular in the US. However, the US situation is different from many other countries. The US has three nationwide KEPs (UNOS, APD, NKR) [2], and US hospitals work independently and compete with each other. Hence, US hospitals tend to register only the hard-to-match patient–donor pairs to the national KEPs, while they try to process their easy-to-match pairs immediately. As a consequence, the aforementioned papers focused on mechanisms that give incentives for hospitals to register all their patient–donor pairs at the KEP. In particular, NKR (the largest nationwide KEP in the US) uses a credit system to incentivize hospitals to register also their easy-to-match pairs by giving negative credits for registering hard-to-match pairs and positive credits for registering easy-to-match pairs.

1.1 Our setting

We consider IKEPs in the setting of European KEPs which are scheduled in rounds, typically once in every three months [14]. Unlike the US setting, this setting allows a search for optimal exchange schemes. Hence, the situation where easy-to-match patient–donor pairs are taken out of the pool is no longer relevant. Below we discuss existing work for the European setting. As we will see, the credit system proposed for the European setting [19, 36] is different from the one used by NKR due to the different nature of the European and US settings.

We first note that the search for an optimal exchange scheme can be done in polynomial time for 2-way exchanges (matchings) but becomes NP-hard as soon as 3-way exchanges are permitted [1].

Carvalho and Lodi [24] used a 2-round system for ensuring stability of IKEPs with 2-way exchanges only: in the first round each country selects an internal matching, and in the second round a maximum matching is selected for the international exchanges. They gave a polynomial-time algorithm for computing a Nash-equilibrium that maximizes the total number of transplants, improving the previously known result of [23] for two countries.

Sun et al. [44] also considered 2-way exchanges only. They defined so-called selection ratios using various lower and upper target numbers of kidney transplants for each country. In their setting, a solution is considered to be fair if the minimal ratio across all countries is maximized. They also required a solution to be a maximum matching and individually rational. They gave theoretical results specifying which models admit solutions with all three properties. Moreover, they provided polynomial-time algorithms for computing such solutions, if they exist.

Klimentova et al. [36] introduced a credit system to incentivize the countries for collaborating by sharing the joint benefits in a fair way. That is, each country will be allocated in each round of the IKEP a “fair” target number of kidney transplants for that country. The differences between the actual number of transplants for a country and its target number are used as credits for the next round. In their simulations, they allowed 3-way exchanges for four countries. They considered the potential and benefit value for the initial allocations, which become the target allocations after the credit adjustment. Their results showed that using the benefit value yields slightly more balanced solutions. Biró et al. [16] compared the benefit value with the Shapley value. In their simulations, for three countries allowing 3-way exchanges, they found that the Shapley value produced smaller deviations from the targets on average.

Biró et al. [19] considered credit-based compensation systems from a theoretical point of view. They only allowed for 2-way exchanges but, unlike [16, 36], with the possibility of having weights for representing transplant utilities. They gave a polynomial-time algorithm for finding a maximum matching that minimizes the largest country deviation from a target allocation. They also showed that the introduction of weights makes the problem NP-hard. In [12], the polynomial-time algorithm of [19] was generalized to a polynomial-time algorithm for computing a maximum matching that lexicographically minimizes the country deviations from a given target allocation. In [10], the theoretical results from [12] and [19] are unified and extended.

1.2 Our contributions

We perform a large scale experimental study (up to 15 countries) on finding balanced solutions in IKEPs. Our motivation is threefold. Firstly, these days, IKEPs have a growing number of countries. Secondly, we aim to measure the effect of using maximum matchings that are lexicographically minimal. We therefore need to consider IKEPs with a large number of participating countries (otherwise maximum matchings that minimize the largest country deviation from a target allocation will probably be lexicographically minimal). Thirdly, motivated by the promising results for the Shapley value [16, 36], we also wanted to thoroughly investigate the effect of using widely accepted solution concepts from cooperative game theory. That is, we model the rounds of an IKEP as so-called partitioned matching games, which were formally introduced in [19]. This indeed allows us to use well-understood solution concepts from cooperative game theory for prescribing the initial “fair” allocations. We define all game-theoretic notions that we need in Sect. 2.

In Sect. 3 we explain the credit system of [12, 16, 19, 36] in our setting. Whilst [16, 36] allowed 3-way exchanges, we only allow 2-way exchanges just as [23, 24, 44]. Similar to [16, 36] we do not consider weights representing transplant utilities. We justify our setting as follows. We first recall that allowing 3-way exchanges [1] or weights [19] makes the problem of computing an optimal exchange scheme in a certain round NP-hard. With the current technology it is not possible to perform an experimental study on such a large scale as we do. Furthermore, some countries, such as France and Hungary, are legally bound to using only 2-way exchanges. Hence, assuming only 2-way exchanges is not unrealistic either. Moreover, in most of the existing KEPs, the primary objective does not involve any weights and is still to maximize the number of kidney transplants [15].

In Sect. 4 we describe the algorithm, called Lex-Min, that we used for computing maximum matchings that lexicographically minimize the country deviations from a given target allocation. As we will explain, this algorithm can also be used for computing maximum matchings that minimize only the largest country deviation from a given target allocation. For the correctness proof and a running time analysis of the algorithm we refer to [12] (see also [10]).

In Sect. 5 we discuss the simulations in more detail, and in Sect. 6 we present the results of our simulations. As mentioned, we conduct simulations for up to even 15 countries in contrast to the previous studies [16, 36] for 3–4 countries. Moreover, we do this both for equal and varying country sizes, and for a large variety of different solution concepts. Namely, our target allocations are prescribed by four hard-to-compute solution concepts: the Shapley value, nucleolus, Banzhaf value and tau valueFootnote 1 and two easy-to-compute solution concepts: the aforementioned benefit value, which coincides with the Gately point if the latter is unique, and a natural variant of the benefit value, the contribution value.Footnote 2 As mentioned, we define all these concepts in Sect. 2.

Our simulations show that a credit system using lexicographically minimal maximum matchings instead of ones that minimize only the largest country deviation from a given target allocation makes an IKEP up to 54% more balanced, without decreasing the overall number of transplants. The exact improvement depends on which solution concept is used. From our experiments, both the Banzhaf and Shapley value yield the best results, namely, on average, a deviation of up to 0.52% from the target allocation. However, the differences between the different solution concepts are small: all the other solution concepts stay within 1.23% from the target allocation, and the choice for using a certain solution concept will be up to the policy makers of the IKEP.

We finish our simulations by examining a new approach for incorporating credits that has not been proposed in the literature before. Namely, it is also natural to let the solution concepts prescribe an allocation for a credit-adjusted game, where the credits are incorporated into the value function of the game directly. As explained in Sect. 3, where we introduce this approach after first describing the original model, only the Banzhaf value may prescribe different allocations. For all the other solution concepts that we consider both the original and new credit system yield exactly the same target allocations. Our simulations show, however, that the modified Banzhaf value yields, on average, a deviation of even up to 0.48% from the target allocation. This is a slight improvement over the best results (0.52%) under the original credit system.

In Sect. 7 we evaluate some other aspects of our simulations. First, we show that IKEPs lead to a significantly larger number of total kidney transplants than the total number of transplants of the KEPs of the individual countries. Second, we show that, although theoretically country credits may build up over time as illustrated with an example in Sect. 3, this situation does not happen in any of our simulations. Third, we evaluate computational time issues in our simulations, showing that the generation of the partitioned matching games is the most expensive operation in our simulations. Fourth, we evaluate a number of game-theoretic properties: core stability aspects, convexity and quasibalancedness. Finally, in Sect. 8, we give directions for future work.

2 Cooperative game theory

We model rounds of IKEPs as partitioned matching games [19]. In this section we define these games. We first provide relevant definitions from cooperative game theory that we will need in the remainder of our paper.

A (cooperative) game is a pair (Nv), where N is a set \(\{1,\ldots ,n\}\) of players and \(v: 2^N\rightarrow \mathbb {R}_+\) is a value function with \(v(\emptyset ) = 0\). A coalition is a subset \(S\subseteq N\). If \(v(N)\ge v(S_1)+\ldots + v(S_r)\) for every partition \((S_1,\ldots ,S_r)\) of N, then the players have an incentive to form the grand coalition N. A partitioned matching game [19] is a game (Nv) on an undirected graph \(D=(V,E)\) with a positive edge weighting w and a partition \((V_1,\ldots ,V_n)\) of V. For \(S \subseteq N\), we let \(V(S)=\bigcup _{p \in S} V_p\). A matching M in a graph is a set of edges, such that no two edges in M have an end-vertex in common. The weight of M is \(w(M)=\sum _{e\in M}w(e)\). The value v(S) of coalition S is the maximum weight of a matching in the subgraph of D induced by V(S). If \(V_p=\{p\}\) for \(p=1,\ldots ,n\), then we obtain the classical matching game (see, for example, [17, 27, 33, 34, 37]). We will mainly consider uniform matching games, that is, with \(w(e)=1\) for every \(e\in E\). Now, v(S) becomes the maximum size of a matching in the subgraph of D induced by V(S), and in particular \(v(N)=\mu\), where \(\mu\) is the size of a maximum matching in D.

The central problem in cooperative game theory is how to distribute v(N) amongst the players in such a way that players are not inclined to leave the grand coalition. In this context, an allocation is a vector \(x \in \mathbb {R}^{{n}}\) with \(x(N) = v(N)\) where we write \(x(S)=\sum _{p\in S}x_p\) for some set \(S\subseteq N\); hence, \(x_i\) prescribes the part of v(N) that is allocated to player i. An allocation x is said to be an imputation if x is individual rational, that is, \(x_p\ge v(\{p\})\) for every \(p\in N\). A solution concept prescribes a set of “fair” allocations for cooperative games, where the notion of fairness depends on context. We now provide definitions of the solution concepts that are relevant for our work.

The core of a game (Nv) consists of all allocations \(x \in \mathbb {R}^{{n}}\) with \(x(S)\ge v(S)\) for every \(S\subseteq N\). Core allocations offer no incentive for a group of players to leave the grand coalition N and form a coalition on their own, so they ensure that N is stable. However, games may have an empty core.

We now define the nucleolus of a cooperative game. In order to do this we need more terminology. For an allocation x and a non-empty coalition \(S \subsetneq N\) in a game (Nv), we define the excess \(e(S,x):= x(S)-v(S)\). We obtain the excess vector \(e(x) \in \mathbb {R}^{2^n-2}\) by ordering the \(2^n-2\) entries in a non-decreasing sequence. The nucleolus of a game (Nv) is the unique allocation [41] that lexicographically maximizes e(x) over the set of imputations. Note that the nucleolus is not defined if the set of imputations is empty. However, every partitioned matching game has a nonempty set of imputations. The nucleolus is a core allocation if the core is nonempty.

The Shapley value \(\phi (N,v)\) of a game (Nv), defined in [42], is one of the best known solution concepts and is defined by setting for every \(p\in N\),

$$\begin{aligned} \phi _p(N,v) = \displaystyle \sum _{S \subseteq N\backslash \{p\}} \frac{|S|!(n-|S|-1)!}{n!}\bigg (v(S\cup \{p\})-v(S)\bigg ). \end{aligned}$$
(1)

Unlike the nucleolus, the Shapley value does not necessarily belong to the core if the core is nonempty. This also holds for (partitioned) matching games (see [11] for a small example).

The unnormalized Banzhaf value \(\psi _p(N,v)\) of a game (Nv) is introduced in [7] and is defined by setting for every \(p\in N\),

$$\begin{aligned} \psi _p(N,v):=\displaystyle \sum _{S \subseteq N\backslash \{p\}} \frac{1}{2^{n-1}}\bigg (v(S\cup \{p\})-v(S)\bigg ). \end{aligned}$$
(2)

Note that \(\psi _p\) may not be an allocation (see e.g. [49]). The (normalized) Banzhaf value \(\overline{\psi }_p(N,v)\) of a game (Nv) rectifies this and is defined by setting for every \(p\in N\),

$$\begin{aligned} \overline{\psi }_p(N,v):=\displaystyle \frac{\psi _p(N,v)}{\sum _{q \in N}\psi _q(N,v)} \cdot v(N). \end{aligned}$$
(3)

Whenever we speak about the Banzhaf value in our paper, we will mean \(\overline{\psi }(N,v)\).

We now define the tau value [45]. Let (Nv) be a game. For \(p\in N\), let \(b_p = v(N) - v(N \setminus \{p\})\) be the utopia payoff for p. This gives us a vector \(b\in \mathbb {R}^n\). For \(p\in N\) and \(S\subseteq N\), let \(R(S,p):=v(S)-\sum _{q \in S{\setminus }\{p\}}b_q\) be what remains for p should the other players in S leave S with their utopia payoff. For \(p\in N\), we let \(a_p:= \max _{S \ni p} R(S,p)\). This gives us a vector \(a\in \mathbb {R}^n\). We say that (Nv) is quasibalanced if both \(a\le b\) (that is, \(a_p\le b_p\) for every \(p\in N\)) and \(a(N) \le v(N) \le b(N)\). For a quasibalanced game (Nv), the tau value \(\tau\) is defined by setting for every \(p\in N\),

$$\begin{aligned} \tau _p:=\gamma a_p + (1-\gamma )b_p, \end{aligned}$$

where \(\gamma \in [0,1]\) is determined by the condition \(\tau (N) = v(N)\). Note that \(\gamma\) is unique unless \(a=b\) in which case \(\tau =a\). The tau value is not defined for games that are not quasibalanced.

In general, all the above solution concepts may require exponential time to compute, assuming that the input is described by an underlying (weighted) graph, as in the case of partitioned matching games. We now define two easy-to-compute solution concepts. To do this, we first define the surplus of a game (Nv) as

$$\begin{aligned} \text{ surp }=v(N) - \sum _{p \in N} v(\{p\}). \end{aligned}$$

A game (Nv) is said to be essential if \(\text{ surp }>0\). Note that an essential game has more than one imputation. If \(\text{ surp }=0\), then the allocation \(x\in \mathbb {R}^n\) with \(x_p=v(\{p\})\) for every \(p\in N\) is the unique imputation, whereas the set of imputations is empty if \(\text{ surp }<0\). As mentioned, partitioned matching games have a nonempty set of imputations, but they may not be essential; consider, for instance, a matching game defined on a graph consisting of two non-adjacent vertices.

For \(p\in N\) we can allocate \(v(\{p\})+\alpha _p\cdot \text{ surp }\) for some \(\alpha \in \mathbb {R}^n\) with \(\sum _{p\in N}\alpha _p=1\). We define two solution concepts that each correspond to a different \(\alpha\).

First, we obtain the known benefit value [36] by setting for each \(p\in N\),

$$\begin{aligned} \alpha _p=\frac{v(N) - v(N \setminus \{p\})-v(\{p\})}{\sum _{p\in N}(v(N) - v(N \setminus \{p\})-v(\{p\}))}. \end{aligned}$$

The benefit value is not defined if \(\sum _{p\in N}(v(N) - v(N {\setminus } \{p\})-v(\{p\})) = 0\).

Moreover, partitioned matching games are superadditive, that is, for every two disjoint coalitions S and T, it holds that \(v(S \cup T) \ge v(S) + v(T)\). As shown by Staudacher and Anwander [43], this means that the benefit value coincides with the Gately point [31],Footnote 3 as long as the Gately point is unique. For superadditive games, the latter holds if there exists at least one player p with \(v(N) - v(N {\setminus } \{p\})-v(\{p\})>0\) [43]. For superadditive games that do not satisfy this condition, we have for every \(p\in N\) that \(v(N)-v(N \setminus \{p\})-v(\{p\})=0\), and thus the benefit value does not exist. Moreover, the benefit value coincides with the tau value when the game is convex [50], that is, for every two coalitions S and T it holds that \(v(S \cup T) + v(S \cap T) \ge v(S) + v(T)\). This condition implies superadditivity. However, already any (uniform) matching game that contains a 3-vertex path uvw as a subgraph is not convex: take \(S=\{u,v\}\) and \(T=\{v,w\}\) and note that \(v(S\cup T)+v(S\cap T)=1<2=v(S)+v(T)\).Footnote 4

Second, we obtain a new solution concept, the contribution value, by setting for each \(p\in N\),

$$\begin{aligned} \alpha _p=\frac{v(N) - v(N \setminus \{p\})}{\sum _{p\in N}(v(N) - v(N \setminus \{p\}))}. \end{aligned}$$

The contribution value is not defined if \(\sum _{p\in N}(v(N) - v(N {\setminus } \{p\})) = 0\). From their definitions, we note that both the benefit value and the contribution value can be computed in polynomial time.

Finally, we observe that there exist even small matching games for which the tau value, benefit value and contribution value do not to exist and for which the Gately point is not unique, while the set of imputations has size larger than 1. Namely, let D be the triangle with unit edge weights.

3 Our model

We model a KEP as follows. A compatibility graph is a directed graph \(D=(V,A)\) with an arc weighting w. Each vertex in V is a patient–donor pair. There is an arc from patient–donor pair i to patient–donor pair j if and only if the donor of pair i is compatible with the patient of pair j. The associated weight \(w_{ij}\) indicates the utility of the transplantation. An exchange cycle is a directed cycle C in D. The weight of a cycle C is the sum of the weights of its arcs. An exchange scheme X is a union of pairwise vertex-disjoint exchange cycles in D. The weight of X is the sum of the weights of its cycles.

A KEP operates in rounds. Each round has its own compatibility graph, which is determined by the current pool of patient–donor pairs. In each round, the goal is to find an exchange scheme of maximum weight, subject to a fixed exchange bound \(\ell\), which is an upper bound on the length of the exchange cycles that may be used.

We obtain an IKEP by partitioning V into subsets \(V_1,\ldots ,V_n\), where n is the number of countries involved and for \(p\in \{1,\ldots ,n\}\), \(V_p\) is the set of patient–donor pairs of country p. The objective is still to find an exchange scheme of D that has maximum weight subject to the exchange bound \(\ell\). In this setting, we must now in addition ensure that the countries accept the proposed exchange schemes.

Assumptions

As explained in Sect. 1, we set \(\ell =2\) and \(w\equiv 1\). As \(\ell =2\), we can make \(D=(V,A)\) undirected by adding an edge between two vertices i and j if and only if both (ij) and (ji) are in A (see Fig. 1). So, from now on, compatibility graphs are undirected graphs.

Fig. 1
figure 1

A directed compatibility graph (left) which we make undirected (right). Here, \({{\mathcal {M}}}=\{M\}\), where \(M=\{i_1i_2,{i_4i_5}\}\). If \(V_1=\{i_1,i_2,i_3\}\) and \(V_2=\{{i_4,i_5}\}\), then \(s_1(M)=s_2(M)=2\). That is, both countries receive two kidney transplants if maximum matching M is used, so all transplants are “in-house”

We now explain the recent credit system from [19, 36] for IKEPs. Let \(N=\{1,\ldots ,n\}\) be the set of countries. For some \(h\ge 1\), let \(D_h\) be the compatibility graph in round h with country vertex sets \(V_1^h,\ldots , V_n^h\). Let \(\mu _h\) be the size of a maximum matching in \(D_h\), so \(2\mu _h\) is the maximum number of kidney transplants possible in round h. Hence, an allocation for round h is a vector \(x^h \in \mathbb {R}^{{n}}\) with \(x^h(N) = 2\mu _h\). That is, \(x^h_p\) describes the share of \(x^h(N)=2\mu _h\) that is allocated to country p. We can only allocate integer numbers (kidneys), but nevertheless we do allow allocations \(x^h\) to be non-integer, as we will explain later.

Assume that we are given a “fair” allocation \(y^h\) for round \(h\ge 1\), together with a credit function \(c^h:N\rightarrow \mathbb {R}\), which satisfies

$$\begin{aligned} \sum _{p\in N}c^h_p=0. \end{aligned}$$

We let \(c^1\equiv 0\) and define \(c^h\) for \(h\ge 2\) below. For \(p=1,\ldots ,n\), we set \(x^h_p=y^h_p+c^h_p.\) Then \(x^h\) is an allocation, as \(y^h\) is an allocation and \(\sum _{p\in N}c_p^h=0\). We call \(x^h\) the target allocation for round h and \(y^h\) the initial allocation for round h.

We now define \(c^h\) for \(h\ge 2\). Let \({{\mathcal {M}}}^h\) be the set of all maximum matchings of \(D_h\). Say, we choose a matching \(M^h\in {{\mathcal {M}}}^h\). Then the set \(\{(i,j)\in E |\; ij\in M^h, j\in V_p^h\}\) consists of all kidney transplants in round h that involve patients in country p (with donors both from country p and other countries). We let \(s_p(M^h)\) denote the size of this set, or equivalently (see Fig. 1),

$$\begin{aligned} s_p(M^h)=|\{j\in V_p^h |\; ij\in M^h\}|. \end{aligned}$$

We now compute a new credit function \(c^{h+1}\) by setting \(c^{h+1}_p=x^h_p-s_p(M^h)\) and note that \(\sum _{p\in N}c_p^{h+1}=0\), as required. For round \(h+1\), a new initial allocation \(y^{h+1}\) is given. For \(p=1,\ldots ,n\), we set \(x_p^{h+1}=y_p^{h+1}+c_p^{h+1}\) and repeat the process.

Note that for every country \(p\in N\) and round \(h\ge 2\), it holds that

$$\begin{aligned} c^h_p=\sum _{t=1}^{h-1}(y_p^t - s_p(M^t)), \end{aligned}$$

so credits are in fact the accumulation of the deviations from the initial allocations. Hence, credits for a country can build up over time, irrespectively of our choice of initial allocations. Later in this section, we will give an explicit example where this happens. However, such situations did not occur in any of our simulations where we used the credit function (see Sect. 7.2).

We now specify our choices for the initial allocations \(y^h\) and maximum matchings \(M^h\in {{\mathcal {M}}}^h\).

Choosing the initial allocation y For prescribing our initial allocations we use the singleton solution concepts from Sect. 2. That is, we use four hard-to-compute solution concepts: the Shapley value, nucleolus, Banzhaf value and tau value, and two easy-to-compute solution concepts: the benefit value and the contribution value.

We use the same solution concept consistently for all rounds but with one exception. Recall that a partitioned matching game may not be quasibalanced, and that in that case the tau value is not defined. If the game is not quasibalanced, then for our simulations involving the tau value, we use the benefit value instead.Footnote 5 As we will see in Sect. 5, where we describe our simulations in more detail, we had to make this replacement in only 0.04% of our simulations.

Naturally, we could have replaced the tau value by a different solution concept. However, we chose the benefit value, as for convex games the tau and benefit values coincide. Moreover, they may even coincide if the game is not convex. Indeed, as we will see in Sect. 7.5, overall only 4.14% of the partitioned matching games in our simulations turned out to be convex, but in 31.6% of the non-convex cases, the tau and benefit values still coincided.

Finally, recall that \(x(N)=2\mu\) for an allocation x, as we count the number of kidney transplants instead of using the maximum number \(\mu\) of patient–donor swaps. To resolve this incompatibility, we multiply the allocations prescribed by the above six solution concepts by a factor of two.

Choosing the solution \({\varvec{M}}\) For a maximum matching \(M\in {{\mathcal {M}}}\) in a partitioned matching game (Nv) we let

$$\begin{aligned} \delta (M)= (|x_{p_1}-s_{p_1}(M)|, \dots , |x_{p_n}-s_{p_n}(M)|) \end{aligned}$$

be the vector obtained by reordering the components \(|x_p-s_p(M)|\) non-increasingly. We say that M is lexicographically minimal for an allocation x if \(\delta (M)\) is lexicographically minimal over all matchings \(M\in {{\mathcal {M}}}\). Every lexicographically minimal matching in \({{\mathcal {M}}}\) is a matching that minimizes

$$\begin{aligned} d_1=\max _{p\in N} \{|x_p-s_p(M)|\}, \end{aligned}$$

but the reverse might not be true.

In our simulations, we choose in each round a maximum matching that is lexicographically minimal for the target allocation. To examine the effect of this, we will perform exactly the same simulations when we choose a maximum matching that only minimizes \(d_1\). In Sect. 4 we present the polynomial-time algorithm for computing these maximum matchings. Moreover, as a baseline approach, we will do the same simulations when an arbitrary maximum matching is chosen.

Fig. 2
figure 2

An example of the first two rounds of an IKEP with \(N=\{1,2,3\}\). Round 1 is displayed on the left, and round 2 on the right. In this example, both rounds are the same, irrespectively of the solution concept that we use for the initial allocations. This is because round 1 has a unique maximum matching

Example. In Fig. 2, compatibility graphs for two rounds of an IKEP consisting of three countries is displayed, so \(N=\{1,2,3\}\).

First assume that we use the Shapley value for the initial allocations. Let \(V_1=\{i_1\}\), \(V_2=\{i_2, i_3\}\) and \(V_3=\{i_4\}\) in round 1. Note that \({{\mathcal {M}}}^1=\{M^1\}\) with \(M^1=\{(i_1,i_2),(i_3,i_4)\}\). So, in round 1, we need to use \(M^1\), as \(M^1\) is the only maximum matching in this round. Recall that \(c^1=(0,0,0)\). Then \(x^1=y^1=\left( \frac{2}{3}, \frac{8}{3}, \frac{2}{3}\right)\). Moreover, \(s(M^1)=(1,2,1)\) and \(c^2=y^1-s(M^1)=(-\frac{1}{3}, \frac{2}{3}, -\frac{1}{3})\). Hence, after round 1, all patient–donor pairs \(i_1,\ldots , i_4\) have been helped and leave the IKEP. Let \(V_1=\{j_1\}\), \(V_2=\{j_2\}\) and \(V_3=\{j_3\}\) in round 2. Note that \({{\mathcal {M}}}^2=\{M^2,M^2_*\}\) with \(M^2=\{j_1j_2\}\) and \(M^2_*=\{j_1j_3\}\). So, in round 2, we must choose between using \(M^2\) or \(M^2_*\), and this choice will be determined by which maximum matching will be closer to the target allocation \(x^2\). Note that \(y^2=\left( \frac{4}{3}, \frac{1}{3}, \frac{1}{3}\right)\). Hence, \(x^2=y^2+c^2=(1,1,0)\) and we must choose \(M^2=\{(j_1,j_2)\}\), which has \(s(M^2)=(1,1,0)\). Consequently, \(c^3=x^2-s(M^2)=(0,0,0)\).

Using, for example, the nucleolus for the initial allocations yields the same for round 1 (as we must use \(M^1\)). However, in round 2, \(y^2=(2,0,0)\). Hence, \(x^2=y^2+c^2=\left( \frac{5}{3}, \frac{2}{3}, -\frac{1}{3}\right)\) meaning that again we must pick \(M^2\). But now, \(c^3=x^2-s(M^2)=\left( \frac{2}{3}, -\frac{1}{3}, -\frac{1}{3}\right)\). This means that in round 3, the nucleolus may lead to choosing a different maximum matching from \({{\mathcal {M}}}^3\). In that case, different patient–donor pairs may leave the IKEP. Consequently, the compatibility graph for round 4 may be different from the compatibility graph for round 4 if the Shapley value had been used.

We now return to our remark regarding the possible accumulation of credits, as for every country \(p\in N\) and round \(h\ge 2\), it holds that \(c^h_p=\sum _{t=1}^{h-1}(y_p^t - s_p(M^t))\). Suppose that round 3 and every future round looks exactly the same as round 2, and suppose that the nucleolus is used as the target allocation. Then, \(c^h_{i_1}=c^{h-1}_{i_1}+1\) for \(h\ge 3\). This is clearly not desirable. We monitor whether this situation happens in our simulations. However, as mentioned, we did not see this kind of behaviour occur (see Sect. 7.2). \(\diamond\)

Alternative credit system A new approach, which preserves superadditivity and convexity, is to define for some round h in an IKEP the credit-adjusted game \(\overline{v}^h\) with

$$\begin{aligned} \overline{v}^h(S) = v^h(S) + \sum _{p \in S}c^h_p \end{aligned}$$

for every \(S\subseteq N\). For solution concepts that are covariant (under strategic equivalence), i.e., that prescribe the same set of allocations both for (Nv) and for \((N,\gamma v + \delta )\) for every \(\gamma >0\) and every \(\delta \in \mathbb {R}^n\), this credit based system works in exactly the same way as before. All solution concepts that we consider have this property except for one: the (normalized) Banzhaf value. Therefore, in order to research whether this alternative way of incorporating credits improves the stability of an IKEP, we only have to perform an extra set of simulations for the (normalized) Banzhaf value.

4 Computing a lexicographically minimal maximum matching

Let (Nv) be a partitioned matching game with a set V of patient–donor pairs. Let \({{\mathcal {M}}}\) be the set of maximum matchings in the corresponding compatibility graph D, and let x be an allocation. In this section we will give our algorithm Lex-Min that we use for computing a maximum matching from \({{\mathcal {M}}}\) that is lexicographically minimal for x. This algorithm computes for a partitioned matching game (Nv) and allocation x, strictly decreasing values \(d_1,\ldots , d_t\) for some integer \(t\ge 1\), and returns a matching \(M\in {{\mathcal {M}}}\) that is lexicographically minimal for x. For computing \(d_1,\ldots ,d_t\), the algorithm calls the polynomial-time algorithm provided by the following lemma from [12] (see also [10]).

Lemma 1

([12]) Given a partitioned matching game (Nv) on a graph \(G=(V,E)\) with a positive edge weighting w and and partition \({{\mathcal {V}}}\) of V, and intervals \(I_1, \dots , I_n\), it is possible in \(O(|V|^3)\)-time to decide if there exists a matching \(M\in {{\mathcal {M}}}\) with \(s_p(M)\in I_p\) for \(p=1,\ldots ,n\), and to find such a matching (if it exists).

Lex-Min

input: a partitioned matching game (Nv) and an allocation x

output: a matching \(M\in \mathcal{M}\) that is lexicographically minimal for x.

Step 1 Compute the smallest number \(d_1 \ge 0\) such that there exists a matching \(M \in \mathcal {M}\) with \(|x_p-s_p(M)| \le d_1 \text { for all } p \in N\).

Step 2 Compute a minimal set \(N_1 \subseteq N\) (with respect to set inclusion) such that there exists a matching \(M \in \mathcal {M}\) with

$$\begin{aligned}&|x_p-s_p(M)| = d_1&\text { for all } p \in N_1\\&|x_p-s_p(M)| < d_1&\text { for all} \;p \in N\setminus N_1. \end{aligned}$$

Step 3 Proceed in a similar way for \(t\ge 1\):

  • while \(N_1 \cup \dots \cup N_t \ne N\) do

  • \(t \leftarrow t+1\).

  • \(d_t \leftarrow\) smallest d such that there exists a matching \(M \in \mathcal {M}\) with

    $$\begin{aligned}&|x_p-s_p(M)| = d_j&\text { for all } p \in N_j, ~j\le t-1\\&|x_p-s_p(M)| \le d_t&\text { for all } p \in N\setminus (N_1 \cup \dots \cup N_{t-1}). \end{aligned}$$
  • \(N_t \leftarrow\) inclusion minimal subset of \(N\setminus (N_1 \cup \dots \cup N_{t-1})\) such that there exists a matching \(M \in \mathcal {M}\) with

    $$\begin{aligned}&|x_p-s_p(M)| = d_j&\text { for all } p \in N_j, ~j\le t-1\\&|x_p-s_p(M)| = d_t&\text { for all } p \in N_t\\&|x_p-s_p(M)| < d_t&\text { for all } p \in N\setminus (N_1 \cup \dots \cup N_{t}). \end{aligned}$$

Step 4 Return a matching \(M\in \mathcal{M}\) with \(|x_p-s_p(M)|=d_j\) for all \(p\in N_j\) and all \(j\in \{1,\ldots ,t\}\).

We say that the countries in a set \(N{\setminus } (N_1\cup \cdots \cup N_{t-1})\) are unfinished and that a country is finished when it is placed in some \(N_t\). Note that Lex-Min terminates as soon as all countries are finished. For a correctness proof and running time analysis of \(\mathop {{\text {Le}}}\limits ^{\frown }{\text {x-Min}}\) we refer to [12] (see also [10]).

Theorem 1

([12]) The Lex-Min algorithm is correct and runs in \(O(n|V|^3\log |V|)\) time for a partitioned matching game (Nv) with an allocation x.

Note that Lex-Min can also be used for computing a maximum matching M that minimizes the maximum deviation \(d_1\) from an allocation x and as such does not need to be lexicographically minimal.

5 Simulations details

In this section we describe our simulations in detail. Our goals are

  1. 1.

    to examine the benefits of using Lex-Min instead of taking a maximum matching that minimizes the largest country deviation \(d_1\) from the target allocation x or just an arbitrary matching,

  2. 2.

    to test the effect of several (sophisticated) solution concepts.

We therefore perform simulations for a large number of countries, as we explain below. Moreover, we do this in the following two settings where:

  1. (i)

    countries all have the same size, and

  2. (ii)

    countries have three different sizes with ratio small:medium:large\(=\)1:2:3.

Simulation instances We first consider setting (i) which is when all countries have the same size. Using the generator [39]Footnote 6 we obtain 100 compatibility graphs \(D_1,\ldots ,D_{100}\), each with roughly 2000 vertices.Footnote 7

For every \(i\in \{1,\ldots ,100\}\) we do as follows. For every \(n\in \{4,\ldots ,15\}\), we perform simulations for n countries. We first partition \(V(D_i)\) into n arbitrary sets \(V_{i,1},\ldots , V_{i,n}\) that are all of the same size 2000/n (subject to rounding), so \(V_{i,p}\) corresponds to the set of patient–donor pairs of country p.

For round 1, we construct a compatibility graph \(D_i^1\) as a subgraph of \(D_i\) of size roughly 500. So, a quarter of the patient–donor pairs will enter the program in round 1. The remaining patient–donor pairs of \(D_i\) will be added as vertices to the compatibility graph randomly, by a uniform distribution between the remaining rounds. Starting with \(D_i^1(n)\) we run a 6-year IKEP with quarterly matching rounds, that is, a simulation that consists of 24 rounds in total. In this way we obtain 24 compatibility graphs \(D_i^1(n),\ldots , D_i^{24}(n)\).

Let \(M_i^j(n)\) be the maximum matching that we compute for \(D_i^j(n)\). If \(j\le 23\), then we construct \(D_i^{j+1}(n)\) as follows. First, we remove the vertices that are matched by \(M_i^j(n)\); the corresponding patient–donor pairs have been helped. If \(j\ge 4\), then we also remove those vertices from \(D_i^j(n)\) that are not in \(M_i^j(n)\) but that do belong to \(D_i^{j-3}(n)\). This is because in real life, patients may seek for alternative treatment or may have deceased after being in the KEP for a year. Finally, we add the vertices that correspond to the patient–donor pairs that were assigned, in advance of the simulation, to enter the program in round \(j+1\).

A (24-round) simulation instance consists of the data needed to generate a graph \(D_i^1(n)\) and its successors \(D_i^2(n),\ldots ,D_i^{24}(n)\), together with specifications for the initial allocation y and maximum matching M to be used in each round. Our code for obtaining the simulation instances is available in GitHub repository [9], along with the data from [39] describing the compatibility graphs and the seeds used for the randomization.

For setting (ii), where country sizes are varying, we use the same 100 compatibility graphs \(D_1,\ldots ,D_{100}\) as before and do exactly the same except that we impose different restrictions on the sizes of the countries. Namely, we partition each \(V(D_i)\) into n arbitrary sets \(V_{i,1},\ldots , V_{i,n}\), such that approximately n/3 are small, that is, have size roughly 1000/n (subject to rounding); approximately n/3 are medium, that is, have size roughly 2000/n; and approximately n/3 are large, that is, have size roughly 3000/n (three times as large as small).

We now discuss how we computed the initial allocations and maximum matchings.

Initial allocations Recall from Sect. 3 that for the initial allocations y we use the Shapley value, nucleolus, Banzhaf value, tau value, the benefit value and the contribution value. Recall also from Sect. 3 that when y is the tau value, we will replace the tau value by the benefit value if the corresponding partitioned matching game is not quasibalanced. Hence, strictly speaking we use a hybrid tau value, but we only need to make this replacement in 0.04% of the games overall; see Table 4 in Sect. 7.5.

Let (Nv) be a partitioned matching game defined on some compatibility graph \(D_i^j(n)\) with n countries. For a coalition of countries \(S\subseteq N\), v(S) is the size of a maximum matching in the subgraph of \(D_i^j(n)\) induced by the vertices of the countries of S. We compute the size of a maximum matching in such a subgraph efficiently, using the package of [28]. The contribution value and benefit value can now be efficiently computed, using their definitions.

For the Shapley value and Banzhaf value, we were still able to implement a naive (brute force) approach relying directly on (1) (see also Table 2 of Sect. 7.3). For the tau value, we first need to compute the vectors a and b. Only b can be computed efficiently, but for computing a we were also still able to use a naive approach that relies directly on its definition. However, a naive approach for computing the nucleolus of a partitioned matching games is not possible for the high number of countries we consider. We therefore use the Lexicographical Descent method of [13], which is the state-of-the-art method in nucleolus computation.Footnote 8

Solutions As mentioned, we aim to examine the benefits of using maximum matchings prescribed by Lex-Min over arbitrary matchings or maximum matchings that minimize the maximum deviation \(d_1\) from the target allocation. For computing an arbitrary maximum matching we use one given to us by the package of [28]. For computing a maximum matching that minimizes only \(d_1\) it suffices to perform only the first step of Lex-Min.

We implemented Lex-Min as follows (for the exact computer code and an explanatory pseudocode, see our GitHub repository [9]). Instead of a binary search for finding a deviation \(d_t\), we performed a greedy search for simplicity. We gradually try to decrease the deviations of the countries in a greedy way, starting with the ones that have the largest deviation. We maintain a set of countries already finished, which initially is the empty set. In every iteration we take one of the unfinished countries, say country p, with the largest deviation \(\delta _p=|x_p-s_p(M)|\), where \(M\in {{\mathcal {M}}}\) is the last maximum matching we computed. We then try to decrease this deviation without allowing a deviation of another unfinished country to increase above \(\delta _p\). If no decrease is possible, then we fix country p and move to the next unfinished country with the largest deviation. If a decrease is possible, then we consider the newly found maximum matching and repeat the process. That is, we consider again a country (possibly country p) with the largest deviation.

We briefly discuss the running time of the greedy variant of Lex-min. In each iteration, either we finish a country, or we shrink the deviation interval of a country by an integer step. Therefore, the number of iterations is upper bounded by the number n of countries plus the sum of the upper integer part of the initial deviations, which is O(|V|). As an application of Lemma 1 takes \(O(|V|^3)\) time, the total running time is \(O((|V|+n)|V|^3)\). This running time can be worse than the running time of the binary search version, which is \(O(n|V|^3\log |V|)\) by Theorem 1, as \(n\log (|V|)\) may be smaller than \(|V|+n\). However, for the instances in our simulations the difference in running time appeared to be negligible.

Finally, as the Lex-Min algorithm uses Lemma 1 as a subroutine, we needed to implement the polynomial-time algorithm provided by Lemma 1 as well, see [12] for a brief description of this algorithm, or [10] for a full description. As explained in [12] (see also [10]), applying Lemma 1 requires solving a maximum weight perfect matching problem. In order to do the latter efficiently, we again used the package of [28].

Credit system We aim to distinguish between the effect of Lex-min and the effect of c, and also to distinguish between the effect of Lex-min over choosing a maximum matching that minimizes \(d_1\) or an arbitrary matching (our baseline). Note that in the latter case, using the credit function c is meaningless, as in each round we pick an arbitrary maximum matching, independently from what happened in the previous round. Hence, we run the same simulations for the following five scenarios, where y is prescribed by one of the following six solution concepts: the Shapley value, nucleolus, Banzhaf value, tau value, benefit value and contribution value.

  1. 1.

    arbitary: M is an arbitrary maximum matching (one computed by the package of [28])

  2. 2.

    d1: M is a maximum matching that minimizes \(d_1\) and \(x=y\).

  3. 3.

    d1+c: M is a maximum matching that minimizes \(d_1\) and \(x=y+c\).

  4. 4.

    lexmin: M is the maximum matching returned by Lex-Min and \(x=y\).

  5. 5.

    lexmin+c: M is the maximum matching returned by Lex-Min and \(x=y+c\).

Additionally, we evaluate the effect of taking credit-adjusted games instead of setting \(x=y+c\). As explained in Sect. 3, this effect can only be measured by taking the Banzhaf value as the initial allocation \(\overline{y}\) for the credit-adjusted games (for all the other solution concepts, we obtain the same outcomes). Hence, this leads to two more scenarios:

  1. 6.

    \(\overline{d1}\): M is a maximum matching that minimizes \(d_1\) and \(x=\overline{y}\).

  2. 7.

    \(\overline{\text{ lexmin }}\): M is the maximum matching returned by Lex-Min and \(x=\overline{y}\).

Hence, in total, we run the same set of simulations for \({30}+2={32}\) different combinations of scenarios and initial allocations.

Computational environment and scale We ran all simulations on a desktop PC with AMD Ryzen 9 5950X 3.4 GHz CPU and 128 GB of RAM, running on Windows 10 OS and C++ implementation in Visual Studio. Our code [9] uses the open-source code [8] of the Lexicographical Descent method for computing the nucleolus. The scale of our experiments for IKEPs is unprecedented: our total number of 24-round simulation instances is equal to \(2\times 27 \times 12 \times 100 = 64800\), namely, two different settings (same or varying country sizes), 27 combinations of scenarios and initial allocations, twelve country sizes n; and 100 initial compatibility graphs \(D_i\).

Evaluation measures To measure balancedness we do as follows. After the 24 rounds of a single instance, we will have a total initial allocation \({y^*}\), which is defined as the sum of the 24 initial allocations, and a maximum matching \(M^*\), which is the union of the chosen matchings in each of the 24 rounds. Note that the total number of kidney transplants is \(2|M^*|\). We now define the total relative deviation as

$$\begin{aligned} \frac{\sum _{p \in N} |{y}_p^* - s_p(M^*)|}{2|M^*|}. \end{aligned}$$

Recall that for each triple that consists of a country set size, choice of initial allocation and choice of scenario, we run 100 instances. We take the average of the 100 relative total deviations. This gives us the average total relative deviation.

Apart from using the average total relative deviation as our evaluation measure, we also took the maximum relative deviation, which is defined as

$$\begin{aligned} \frac{\max _{p \in N} |{y}_p^* - s_p(M^*)|}{2|M^*|}, \end{aligned}$$

leading to the average maximum relative deviation as our second evaluation measure.

6 Main results

Fig. 3
figure 3

Average total relative deviations for the situation where all countries have the same size. The number of countries n is ranging from 4 to 15. The figures on the right side zoom in on the figures from the left side by removing the results for the arbitrary matching scenario

In Fig. 3 we display the main results for the situation where all countries have the same size and when we use the average total relative deviation as our evaluation measure. As expected, using an arbitrary maximum matching in each round makes the kidney exchange scheme significantly more unbalanced, with average total relative deviations above 13.8% for all initial allocations y.

From Fig. 3 we can compute the relative improvement of lexmin+c over d1+c. For example, for \(n=15\), this percentage is \((2.05-0.97)/2.05=\) 52.49% for the tau value, whereas for the other solution concepts it is 45.5% (nucleolus); 44% (benefit value); 41% (contribution value); 40% (Shapley value); and 40% (Banzhaf value). Considering the average improvement over \(n=4,\ldots ,15\) yields percentages of 37% (tau value); 36% (nucleolus); 31% (benefit value); 30% (Shapley value); 27% (Banzhaf value); and 24% (contribution value).

From Fig. 3 we can also compare lexmin+c with lexmin, and d1+c with d1. We see that using c has a substantial effect. Whilst lexmin ensures that allocations stay close to the target allocations, the role of c is to keep the deviations small and to guarantee fairness for the participating countries over a long time period.

Our main conclusion from Fig. 3 is that using lexmin+c yields the lowest average total relative deviation for all six initial allocations y, with larger differences when the number of countries is growing. In Fig. 4 we displayed the six lexmin+c graphs of Fig. 3 in one plot in order to compare them with each other. As mentioned, the choice for initial allocation is up to the policy makers of the IKEP. However, from Fig. 4 we see that the Shapley value and the Banzhaf value in the lexmin+c scenario consistently provides the smallest deviations from the target allocations (0.52% for \(n=15\)), while the contribution value for \(n \le 12\) and the nucleolus for \(n\ge 13\) perform the worst. The latter result is perhaps somewhat surprising given the sophisticated nature of the nucleolus.

Fig. 4
figure 4

Displaying the six lexmin+c graphs of Fig. 3 in one plot

Fig. 5
figure 5

Comparing the lexmin+c graphs for the Shapley value and Banzhaf value from Fig. 3 with the one for the Banzhaf* value (same country sizes)

Fig. 6
figure 6

Comparing the d1+c graphs for the Shapley value and Banzhaf value from Fig. 3 with the one for the Banzhaf* value (same country sizes)

We now turn to the Banzhaf value for the credit-adjusted games, which we denote as the Banzhaf* value. Recall that choosing any of the other solution concepts as initial allocation will lead to the same results as for the original games, an only for the Banzhaf value the two different credit systems may give different results. Figure 5 shows that the latter is indeed the case. It displays the lexmin+c graphs for the Shapley value and (original) Banzhaf value from Fig. 3 and compares them with the lexmin+c graph for the Banzhaf* value. Figure 6 does the same for the three d1+c graphs. Both figures show that the Banzhaf* value behaves better than the Shapley value and Banzhaf value; however the differences are very small (at most 0.04% for lexmin+c and 0.19% for d1+c).

If we use our second evaluation measure, the average maximum relative deviation, then we obtain similar results and can draw the same conclusions; we refer to Appendix 1 for the corresponding figures.

We now turn to the situation of varying country sizes and perform the same simulations as before. We can draw exactly the same conclusions with different percentages (and therefore do not perform additional simulations for varying country sizes). That is, from Fig. 7 we see that using an arbitrary maximum matching in each round makes the kidney exchange scheme significantly more unbalanced, with average total relative deviations above 8.15% for all initial allocations y. Moreover, from Fig. 7 we can also compute the relative improvement of lexmin+c over d1+c. For example, for \(n=15\), this percentage is \((1.13-2.45)/2.45=\) 54% for the nucleolus, whereas for the other solution concepts it is 53% (contribution value); 49% (tau value); 48% (benefit value); 46% (Banzhaf value); and 38% (Shapley value). Considering the average improvement over \(n=4,\ldots ,15\) now yields percentages of 44% (contribution value); 41% (nucleolus); 35% (tau value); 32% (benefit value); 25% (Shapley value); and 25% (Banzhaf value). Compare lexmin+c with lexmin, and d1+c with d1, we see again that using c has a substantial effect. Our main conclusion from Fig. 7 is again that using lexmin+c yields the lowest average total relative deviation for all six initial allocations y, with larger differences when the number of countries is growing. However, from Fig. 8 we see again that the Shapley value and the Banzhaf value in the lexmin+c scenario consistently provides the smallest deviations from the target allocations (0.55% and 0.54% for \(n=15\)), while the contribution value for \(n \le 13\) and the nucleolus for \(n\ge 14\) perform the worst.

Fig. 7
figure 7

Average total relative deviations for the situation where the countries vary in size. The number of countries n is ranging from 4 to 15. The figures on the right side zoom in on the figures from the left side by removing the results for the arbitrary matching scenario

Fig. 8
figure 8

Displaying the six lexmin+c graphs of Fig. 7 in one plot

Turning now to the credit-adjusted games and the Banzhaf* value, the situation with only the Banzhaf value producing different results among the tested allocations, and only in scenarios lexmin+c and d1+c (since without credits the games are the same), naturally remains the same. However, as shown in Figs. 9 and 10, the behaviour of Banzhaf* value differs under the varying country sizes, in one part performing slightly worse than the Shapley value and the original Banzhaf value for \(n \le 14\) (the differences are again very small: within 0.08% and 0.11% for lexmin+c and d1+c respectively), but outperforming both for \(n=15\), by at most 0.05% for lexmin+c and 0.3% for d1+c.

Fig. 9
figure 9

Comparing the lexmin+c graphs for the Shapley value and Banzhaf value from Fig. 3 with the one for the Banzhaf* value (varying country sizes)

Fig. 10
figure 10

Comparing the d1+c graphs for the Shapley value and Banzhaf value from Fig. 3 with the one for the Banzhaf* value (varying country sizes)

If we use the average maximum relative deviation instead of the average total relative deviation, then again we obtain similar results and can draw the same conclusions; see again Appendix 1 for the corresponding figures.

7 Evaluation of further aspects

In this section we evaluate some other aspects of our simulations. Given that the results of our simulations for varying countries were similar to the results of our simulations for same country sizes, we only evaluated these aspects for the situation, in which all countries have the same size.

7.1 No cooperation

It is a natural question to what extent cooperation between countries helps. Table 1 shows that cooperation leads to a significantly larger number of total kidney transplants than non-cooperation. This is especially the case when more and more countries are participating in the IKEP. In particular, Table 1 shows that a gain of 2.86 times as many kidney transplants can be obtained in the case where the number of countries is 15. Hence, Table 1 provides strong evidence for forming large IKEPs.

We also note the following. Theoretically, a change in scenario may result in a change in maximum matching size (total number of kidney transplants). However, Table 1 shows that these differences turn out to be negligible (between 0.01% and 0.1% on average).

Table 1 For \(n=4,\ldots ,15\), the improvement on the average number of kidney transplants if cooperation is allowed
Fig. 11
figure 11

The average accumulated deviation over the 24 rounds when the number of countries \(n=15\). The right side of the figure is taken from the left side after omitting the additional scenario where arbitrary maximum matchings are chosen

7.2 Credit accumulation

In Sect. 3, we gave a theoretical example where credits build up over time for a certain country and are essentially meaningless. However, this behaviour did not happen in any of our 24-round simulations. We performed for every number n of countries with \(n\in \{4,\ldots ,15\}\), a refined analysis, just to verify if such behaviour could be expected if the number of rounds is larger than 24.

First recall that for a single instance, \(c^h_p = x^h_p - y^h_p\) and also that \(c^h_p=\sum _{t=1}^{h-1}(y_p^t - s_p(M^t))\) for country p and round \(h\ge 2\). That is, credits are the difference between the initial and target allocations in each round, as well as the accumulation of the deviations from the initial allocations. The latter is the accumulation we would like to avoid from happening by using the credit function c. In order to assess credit accumulation over time, we define the average accumulated deviation at round h as the average of \({\sum _{p \in N} |c_p^h|}\) over the 100 instances corresponding to a certain scenario and choice of initial allocation.

In Fig. 11 we show the results of our analysis. The results displayed are only for \(n=15\), as the figures for \(n\in \{4,\ldots ,14\}\) turned out to be very similar. As Fig. 11 shows, the behaviour of credit accumulation is similar for each choice of initial allocation y. Moreover, as expected, the average total deviation is clearly accumulating over time if arbitrary maximum matchings are chosen as solutions (see the left side figures of Fig. 11). Under lexmin and d1 there is still accumulation. However the credit system is indeed successfully mitigating against this effect, as the plots for lexmin+c and d1+c show (see the right side figures of Fig. 11). In particular, there is no indication that this behaviour will change if the number of rounds is larger than 24.

7.3 Computational time

We refer to Table 2 for an overview of various computational times by our simulations. We note that Lex-min computes at most \(n-1\) \(d_i\)-values, and in our experiments we actually found instances where \(d_{n-1}\) was computed even for \(n=10\). However, Table 2 shows that using lexicographically minimal maximum matchings instead of ones that only minimize the largest deviation \(d_1\) from the target allocation does not require a significant amount of additional computation time. It can also be noticed from Table 2 that, as expected, computing the Shapley value and the nucleolus is more expensive than computing the contribution value and the benefit value, especially as the number of countries n is growing. Finally, we see from Table 2 that the game generation, that is, computing the \(2^n\) values v(S), becomes by far the most expensive part when n is growing.

Table 2 Computational times for a single instance, broken down to the different computation tasks for lexmin+c, while the total rows for the different scenarios are averaged over the four initial allocations

7.4 Coalitional stability

In order to assess the long-term coalitional stability of an IKEP, we turn our focus towards the core of the accumulated partitioned matching games. These games are obtained by summing up the 24 partitioned matching games of each of the 24 rounds of a simulation instance. That is, the accumulated partitioned matching game (Nv) is obtained from the partitioned matching games \((N,v^h)\) on compatibility graphs \(D^h\) for \(h=1, \dots , 24\) by setting \(v=\sum _{h=1}^{24} v^h\). We define the accumulated initial allocation as \(y=\sum _{h=1}^{24} y^h\) and the accumulated solution as the accumulated number of kidney transplants \(s=\sum _{h=1}^{24} s(M^h)\), where \(M^h\) is the chosen matching in round h.

All the accumulated partitioned matching games in our simulation had a nonempty core. Moreover, all accumulated initial allocations and accumulated solutions turned out to be in the core, except for a few rare cases of accumulated solutions under the arbitrary scenario. For comparison, we evaluate how far away both the accumulated initial allocations and accumulated solutions are from violating a core inequality in the accumulated partitioned matching game. We do this by taking the radius of the largest ball that can be fit into the core with its center being an accumulated initial allocation, or accumulated solution, respectively.Footnote 9 Unsurprisingly, this radius is decreasing as the number of countries is increasing. Moreover, the distance of violating a core allocation is practically the same, independently of the chosen scenario. We refer to Tables 5 and 6 in Appendix 2 for details and to Table 3 for a summary obtained from these two tables by averaging over the number of countries for the lexmin+c scenario.

From Table 3 we see a high and similar level of stability for all choice of initial allocations. Although the Shapley and Banzhaf values provide consistently the smallest deviations (see Fig. 4), Table 3 shows that the tau value (highest), the benefit value and the nucleolus provide higher levels of coalitional stability not only for the accumulated initial allocations, but also for the accumulated solutions.

Table 3 Average distances, over n ranging from 4 to 15, of accumulated initial allocations (first row) and accumulated solutions from violating a core inequality of the accumulated partitioned matching games under the lexmin+c scenario
Table 4 The first column refers to number of countries

7.5 Convexity and quasibalancedness

Recall that the tau value is only defined if the game is quasibalanced and that we replaced the tau value by the benefit value if the tau value is not defined. We also recall that tau value and benefit value coincide when the game is convex. Table 4 provides justification for this replacement.

8 Conclusions

Our simulations showed that using maximum matchings that are lexicographically minimal with respect to the country deviations from target allocations leads to a significant improvement for IKEPs. Moreover, they showed that this improvement is even more significant when the number of countries is large. This is relevant, as IKEPs, such as Eurotransplant, are under development and others, such as Scandiatransplant, are expected to grow.

Both lexicographically minimal maximum matchings and maximum matchings that only minimize the maximum deviation \(d_1\) can be computed in polynomial time. In practice one might expect that the latter can still be computed faster. However, our simulations showed that computing them instead of maximum matchings that only minimize the maximum deviation indeed does not require any significant additional computational time (see Sect. 7.3).

A challenging part of our project was to compute the nucleolus of partitioned matching games consisting of up to fifteen countries. For this we used the state-of-the-art Lexicographical Descent method of [13].

Future research All the above findings for 2-way exchange cycles are also interesting to research for a setting with \(\ell\)-way exchange cycles for \(\ell \ge 3\). The previous experimental studies [16, 36] for \(\ell =3\) only considered 3–4 countries. To do meaningful experiments for a large number of countries, a new practical approach is required to deal with the computational hardness of computing optimal solutions (recall the aforementioned NP-hard result of [1] for the case where \(\ell \ge 3\)).

We also plan to consider directed compatibility graphs with weights w(ij) on the arcs (ij) representing the utility of transplant (ij). Computing a maximum-weight solution that minimizes the weighted country deviation \(d_1\) now becomes NP-hard [19]. However, we could still consider the set of maximum-size solutions as our set \({{\mathcal {M}}}\) instead of the set of maximum-weight solutions. We can then find a maximum-weight matching that lexicographically minimizes the original country deviations \(|x_p-s_p(M)|\). The main challenge is to set weights w(ij) appropriately, since optimization policies may vary widely in national KEPs. In Europe, maximizing the number of transplants is the first objective (as in our setting). However, further scores are based on different objectives, such as improving the quality of the transplants, easing the complexity of the logistics or giving priority to highly sensitized patients; see [15] for further details.