1 Introduction

Since Darwin’s time, explaining cooperative behavior in groups of self-interested individuals has been a challenge (Trivers 1971; Axelrod 1984; Hofbauer and Sigmund 1998; Nowak 2006a, b; Henrich and Henrich 2007; Sigmund 2010; Bowles and Gintis 2011). Game theory including evolutionary game theory has shown that a population of self-interested individuals playing a social dilemma game of the prisoner’s dilemma type does not sustain cooperation without an additional mechanism. To explain cooperation in social dilemma situations in nature including in biological populations and to promote cooperation in human society, there have been proposed various mathematical mechanisms to support cooperation. Population structure as represented by contact networks of individuals is one such mechanism. The structure of contact networks constrains who can interact with whom and promotes emergence and endurance of clusters of cooperative players in local regions in spatial lattices (Axelrod 1984; Nowak and May 1992, 1993; Szabó and Fath 2007) and adjacent pairs of nodes in general networks (Santos and Pacheco 2005; Ohtsuki et al. 2006; Santos et al. 2006; Szabó and Fath 2007; Allen et al. 2017).

A major indicator of the success of a mutant trait in evolutionary dynamics is the fixation probability. It is defined as the probability that the mutant type will spread and eventually occupy the entire population as a result of evolutionary dynamics, given an initial distribution of mutants (Nowak et al. 2004; Ewens et al. 2004; Lieberman et al. 2005; Nowak 2006b). When each individual is in either of the two types (i.e., wild and mutant) at any given time and the population structure is described by a network on N nodes, the state of the network is specified by an N-dimensional binary vector of which the ith entry encodes the type of the ith node. In the absence of mutation, the fixation probability of the mutant starting from the state in which all the nodes are of the wild type is equal to 0. The fixation probability of the mutant is equal to 1 if all the nodes are initially mutant. For general initial conditions, the exact solution of the fixation probability requires solving a linear system of \(2^N-2\) equations (Lieberman et al. 2005; Ohtsuki et al. 2006). Therefore, it is difficult to exactly compute the fixation probability except for small networks, highly symmetric networks, or networks with other mathematically convenient properties.

We focus on social dilemma situations, in particular the prisoner’s dilemma game, in the present paper. In the prisoner’s dilemma, the wild and mutant types correspond to cooperator and defector, respectively, or vice versa. The calculation of the fixation probability for the prisoner’s dilemma game on networks, potentially with some additional assumptions, is usually more involved than the calculation in the case of the constant selection, in which the fitness of the wild and mutant types is fixed throughout the evolutionary dynamics. In games, the fitness of an individual generally depends on how other individuals behave, which makes setting up the linear system of \(2^N-2\) equations and efficiently solving it, particularly the latter, a difficult task. Under this circumstance, weak selection is an assumption that often facilitates analytical evaluation of the fixation probability of the mutant type including in social dilemma games (Nowak et al. 2004). Let us write down each individual’s fitness as a sum of a constant term, called the baseline fitness, and the payoff that the individual receives by playing the game. By definition, weak selection means that the payoff is small compared to the baseline fitness. Under weak selection, Ohtsuki et al. developed a pair approximation theory that enables us to analytically derive the conditions under which cooperation fixates with a larger probability than a baseline on random regular graphs, i.e., random graphs in which all nodes have the same number of neighbors (Ohtsuki et al. 2006). Furthermore, Allen et al. extended this result to the case of arbitrary networks using coalescence times from random walk theory (Allen et al. 2017). With these methods, one can avoid dealing with a set of \(2^N-2\) linear equations and calculate the leading term of the fixation probability in polynomial time in terms of N.

In Allen et al. (2017), the authors derived a key indicator to quantify the ease of cooperation in networks, i.e., the threshold benefit-to-cost ratio above which selection favors cooperation, denoted by \((b/c)^*\). In fact, substantial changes in \((b/c)^*\) may occur when one only slightly perturbs the network structure, which is an operation referred to as graph surgery (Allen et al. 2017). A carefully designed graph surgery may enhance cooperation by reducing \((b/c)^*\) by a larger amount than by a random graph surgery. For example, a small mean degree (i.e., the number of neighbors that a node has) of the network tends to induce cooperation (Ohtsuki et al. 2006; Allen et al. 2017). Therefore, decreasing the weight of an edge or removing an edge is expected to enhance cooperation. However, this may not be an optimal choice. Which particular edge should we perturb or remove to efficiently enhance cooperation? One can answer this question by removing just one edge from the original network, calculating \((b/c)^*\) for the perturbed network, and repeating the same procedure for each different perturbation of the original network. However, this procedure may be computationally costly. Note that the method to calculate the fixation probability for cooperation in arbitrary networks, developed in Allen et al. (2017), is still computationally costly although its computational complexity is polynomial in N.

In the current study, we develop a perturbation theory with the aim of predicting the direction and amount of the change in \((b/c)^*\) when one slightly perturbs the weight of an arbitrary single edge. We find that, for most networks, the actual change in \((b/c)^*\) when we remove an edge and the change predicted by our perturbation theory are strongly correlated, which makes it possible to propose a single edge to be removed for efficiently enhancing cooperation. However, the correlation between the result of direct numerical simulations and the perturbation theory is considerably weaker when one adds a new edge to the existing network. Therefore, our perturbation theory is not practically useful when one adds new edges. Compared to the direct numerical simulations, our perturbation theory is much faster, which allows us to compute the fixation probability under graph surgery in larger networks.

2 Fixation of cooperation on networks under weak selection

We assume that the graph G is connected and undirected. We denote the set of nodes by \(V = \{1, \ldots , N\}\), where N is the number of nodes. For each pair of nodes \(i, j\in V\), we denote the edge weight by \(w_{ij} \ge 0\). If there is no edge between i and j, we set \(w_{ij} = 0\). We allow self-loops, i.e., positive values of \(w_{ii}\) (Allen et al. 2017). The weighted degree of node i, denoted by \(s_i = \sum _{j=1}^N w_{ij}\), also called the node strength, is the sum of the weight of the edges connected to the node.

A discrete-time random walker is said to be simple if the walker located at node i moves to one of its neighbors, denoted by j, in a single time step with probability proportional to \(w_{ij}\), i.e., with probability \(p_{ij} = w_{ij}/s_i\). Let \(W = (w_{ij})\) be the \(N\times N\) weighted adjacency matrix. The transition probability matrix \(P = (p_{ij})\) of the simple random walk is given by \(P = D^{-1}W\), where \(D = \textrm{diag}(s_1, \ldots , s_N)\), i.e., the diagonal matrix whose diagonal entries are equal to \(s_1, s_2, \ldots , s_N\). Let \(\varvec{\pi }= (\pi _1, \ldots , \pi _N)\) be the stationary probability vector of the random walk with transition probability matrix P, i.e., the solution of \(\varvec{\pi }P = \varvec{\pi }\). It holds true that (see Aldous and Fill 2002; Masuda et al. 2017)

$$\begin{aligned} \pi _i = \frac{s_i}{\sum _{\ell =1}^N s_\ell },\quad i\in \{1, \ldots , N\}. \end{aligned}$$
(1)

We use the donation game, which is a special case of the prisoner’s dilemma game. In the donation game, which is a two-player game, one player, called the donor, decides whether or not to pay a cost c \((>0)\). If the donor pays c, which we refer to as cooperation, then the other player, called the recipient, receives benefit b \((>c)\). If the donor does not pay c, which we refer to as defection, then the donor does not lose anything, and the recipient does not gain anything. Therefore, the payoff matrix of the donation game for a pair of players is given by

(2)

where C and D represent cooperation and defection, respectively, and the payoff values represent those for the row player. We assume that each player on a node participates in the game as donor and recipient half of the times each.

We assign 0 and 1 to the defector and cooperator, respectively. Then, we can represent a state of the entire network by a binary vector \({\varvec{x}} =(x_1, \ldots , x_N) \in \{0, 1\}^N\). With this notation, the payoff of node i averaged over all its neighbors is given by

$$\begin{aligned} f_i({\varvec{x}}) = -c x_i + b\sum _{j=1}^N p_{ij}x_j. \end{aligned}$$
(3)

The reproductive rate of node i in state \({\varvec{x}}\) is given by

$$\begin{aligned} R_i ({\varvec{x}}) = 1 + \eta f_i({\varvec{x}}), \end{aligned}$$
(4)

where \(\eta \) represents the strength of the selection. If \(\eta =0\), the reproductive rate does not depend on the payoff matrix or the action (i.e., cooperation or defection) of any node. This case is equivalent to the so-called voter model. If \(\eta \rightarrow 0\), the payoff weakly impacts the selection, and this limit is called the weak selection regime. The idea behind weak selection is that, in reality, many different factors may contribute to the overall fitness of an individual, and the game under consideration is just one such factor (Ohtsuki et al. 2006; Allen et al. 2017).

We drive evolutionary dynamics by the death-birth process with selection on birth on an arbitrary network composed of cooperators and defectors (Ohtsuki et al. 2006; Allen et al. 2017). Specifically, we first select a node to be updated, denoted by i, uniformly at random. Second, we select one of the i’s neighbors, denoted by j, for reproduction with the probability proportional to \(w_{ij} R_j({\varvec{x}})\). Third, the offspring, i, inherits the type of j. This completes a single round of the evolutionary dynamics, which we schematically show in Fig. 1.

Fig. 1
figure 1

Death-birth process with selection on birth on the unweighted network. a Each individual obtains a payoff by interacting with all its neighbors. C and D represent cooperator and defector, respectively. b We select a node whose type to be replaced uniformly at random, shown in gray. Then, one of the three neighbors of this node, whose payoff values are indicated, will replace the gray node. We select each of the cooperating neighbors with probability \([1+\eta (b/2-c)] / [1+\eta (b/2-c) + 1+\eta (b/2-c) + 1+\eta (2b/3)] = [6+3\eta (b-2c)]/[18+2\eta (5b-6c)]\) and the defecting neighbor with probability \([1+\eta (2b/3)] / [1+\eta (b/2-c) + 1+\eta (b/2-c) + 1+\eta (2b/3)] = (3+2\eta b)/[9+\eta (5b-6c)]\) for reproduction. c In this example, we select the cooperating neighbor to the left, which replaces the offspring node

The death-birth process in any finite population without mutation will eventually reach the state in which all individuals are cooperators or defectors and halt. In other words, the cooperation or defection fixates in finite time with probability 1. Suppose the initial condition in which one node is cooperator and the other \(N-1\) nodes are defectors. There are N such initial conditions depending on which node is the cooperator. We consider the initial probability distribution over all possible states that assigns probability 1/N to each of the states with exactly one cooperator and probability zero to all the other states. We denote by \(\rho _{\textrm{C}}\) the expectation that the cooperation fixates under this distribution of the initial state. If \(\rho _{\textrm{C}} > 1/N\), natural selection favors cooperation (Nowak et al. 2004; Ohtsuki et al. 2006; Nowak 2006b; Allen et al. 2017).

Allen et al. (2017) showed that

$$\begin{aligned} \rho _{\textrm{C}} = \frac{1}{N} + \frac{\eta }{2N}\left[ -c\tau _2 + b(\tau _3 - \tau _1)\right] + O(\eta ^2), \end{aligned}$$
(5)

where

$$\begin{aligned} \tau _k = \sum _{i=1}^N\sum _{j=1}^N \pi _i p_{ij}^{(k)}t_{ij}, \end{aligned}$$
(6)

\(p_{ij}^{(k)}\) is the (ij)th entry of matrix \(P^k\), which implies that \(p_{ij}^{(1)} = p_{ij}\), and

$$\begin{aligned} t_{ij} = {\left\{ \begin{array}{ll} 0 &{} \textrm{if} \ i=j, \\ 1 + \frac{1}{2}\sum _{k=1}^N(p_{ik}t_{jk} + p_{jk}t_{ik}) &{} \textrm{otherwise}. \end{array}\right. } \end{aligned}$$
(7)

Equation (7) implies that \(t_{ij} = t_{ji}\) is the mean coalescence time of two random walkers when one walker is initially located at node i and the other at node j. Note that \(p_{ij}^{(k)}\) is the k-step transition probability of the random walk from node i to node j. Therefore, \(\tau _k\) is the expected value of \(t_{ij}\) when i and j are the two ends of a k-step random walk trajectory on G under the stationary distribution (Allen et al. 2017). Equation (5) implies that the threshold value of the benefit-to-cost ratio above which the natural selection favors cooperation (i.e., \( \rho _{\textrm{C}} > 1/N\)) is given by

$$\begin{aligned} \left( \frac{b}{c}\right) ^* = \frac{\tau _2}{\tau _3 - \tau _1}. \end{aligned}$$
(8)

Natural selection favors cooperation if \(b/c > (b/c)^*\).

For example, if the underlying network is regular with degree k, we have

$$\begin{aligned}&\tau _1 = N - 1, \end{aligned}$$
(9)
$$\begin{aligned}&\tau _2 = N - 2, \end{aligned}$$
(10)

and

$$\begin{aligned} \tau _3 = N + \frac{N}{k} - 3, \end{aligned}$$
(11)

such that

$$\begin{aligned} \left( \frac{b}{c}\right) ^* = k \end{aligned}$$
(12)

as \(N\rightarrow \infty \) (Allen et al. 2017). Note that the right-hand side of Eq. (8) only depends on the adjacency matrix of the network. In other words, the structure of the contact network determines whether and how much natural selection favors cooperation.

3 Perturbation theory for graph surgery

In this section, we develop a perturbation theory to determine the change in \((b/c)^*\) when one perturbs the weight of a single edge. To this end, we start by rewriting Eq. (6) in terms of matrices and vectors. Let \({\varvec{1}} = (1, \ldots , 1)^\top \), where \(^\top \) represents the transposition. Let \(T = (t_{ij})\) be the \(N\times N\) matrix of the mean coalescence time. Using these notations, we rewrite Eq. (6) as

$$\begin{aligned} \tau _k = \varvec{\pi }\left( P^k\circ T \right) {\varvec{1}}, \end{aligned}$$
(13)

where \(k=1, 2, 3\), and \(\circ \) represents the Hadamard product.

If one changes the weight of an edge \((i_0, j_0)\) by \(\varepsilon \), where \(| \varepsilon | \ll 1\), including the case in which we create a new edge with weight \(\varepsilon \) \((> 0)\), the perturbed network remains connected and undirected. Therefore, one can still use Eq. (8) to compute \((b/c)^*\). Equation (8) uses Eq. (6), which requires \(\varvec{\pi }\), P, and T. We denote these variables after the perturbation by \(\varvec{\pi }(\varepsilon )\), \(P(\varepsilon )\), and \(T(\varepsilon )\). To distinguish the quantities before and after the perturbation, we denote these variables before the perturbation by \(\varvec{\pi }(0)\), P(0), and T(0).

For writing down \(\varvec{\pi }(\varepsilon )\), we denote by

$$\begin{aligned} S = \sum _{i=1}^N s_{i} = \sum _{i=1}^N \sum _{j=1}^N w_{ij} \end{aligned}$$
(14)

the sum of the weighted degree of over all the nodes. Under a small perturbation, we carry out Taylor expansion of Eq. (1) to obtain

$$\begin{aligned} \varvec{\pi }(\varepsilon ) = \varvec{\pi }(0) + \varepsilon \Delta \varvec{\pi }+ o(\varepsilon ), \end{aligned}$$
(15)

where \(\Delta \varvec{\pi }= (\Delta \pi _1, \ldots , \Delta \pi _N)\). We obtain

$$\begin{aligned} \Delta \pi _i = \frac{\delta _{ii_0} + \delta _{ij_0}}{S} - \frac{2\pi _i(0)}{S}, \end{aligned}$$
(16)

where \(\delta _{ij}\) is the Kronecker delta. We present the derivation of Eq. (16) in Appendix A.

To calculate \(P(\varepsilon )\), we define a symmetric indicator function, denoted by \(\chi _{i_0 j_0}\), by

$$\begin{aligned} \chi _{i_0 j_0}(i, j) = {\left\{ \begin{array}{ll} 1 &{} \mathrm {if\ } (i, j) = (i_0, j_0)\ \textrm{or}\ (i, j) = (j_0, i_0),\\ 0 &{} \textrm{otherwise}. \end{array}\right. } \end{aligned}$$
(17)

We obtain

$$\begin{aligned} P(\varepsilon )&= P(0) + \varepsilon \Theta ^{(1)} + o(\varepsilon ), \end{aligned}$$
(18)
$$\begin{aligned} P^2(\varepsilon )&= P^2(0) + \varepsilon \left[ \Theta ^{(1)} P(0) + P(0) \Theta ^{(1)}\right] + o(\varepsilon ) \nonumber \\&:= P^2(0) + \varepsilon \Theta ^{(2)} + o(\varepsilon ), \end{aligned}$$
(19)
$$\begin{aligned} P^3(\varepsilon )&= P^3(0) + \varepsilon \left[ \Theta ^{(1)} P^2(0) + P(0) \Theta ^{(1)} P(0) + P^2(0) \Theta ^{(1)}\right] + o(\varepsilon ) \nonumber \\&:= P^3(0) + \varepsilon \Theta ^{(3)} + o(\varepsilon ), \end{aligned}$$
(20)

where \(\Theta ^{(1)} = (\theta ^{(1)}_{ij})\), \(\Theta ^{(2)} = (\theta ^{(2)}_{ij})\), and \(\Theta ^{(3)} = (\theta ^{(3)}_{ij})\) are \(N \times N\) matrices whose entries are given by

$$\begin{aligned} \theta ^{(1)}_{ij}&= \frac{\chi _{i_0 j_0}(i, j)}{s_i} - p_{ij}(0) \frac{\delta _{ii_0} + \delta _{ij_0}}{s_i}, \end{aligned}$$
(21)
$$\begin{aligned} \theta ^{(2)}_{ij}&= \frac{\delta _{ii_0}p_{j_0j}(0)}{s_i} + \frac{\delta _{ij_0}p_{i_0j}(0)}{s_i} - p_{ij}^{(2)}(0)\frac{\delta _{ii_0}+\delta _{ij_0}}{s_i} \nonumber \\&\quad + \frac{\delta _{jj_0}p_{ii_0}(0)}{s_{i_0}} + \frac{\delta _{ji_0}p_{ij_0}(0)}{s_{j_0}} - p_{ii_0}(0)p_{i_0j}(0) \frac{1}{s_{i_0}} - p_{ij_0}(0)p_{j_0j}(0) \frac{1}{s_{j_0}}, \end{aligned}$$
(22)

and

$$\begin{aligned} \theta ^{(3)}_{ij}&= \frac{\delta _{ii_0}}{s_{i_0}}p^{(2)}_{j_0j}(0) + \frac{\delta _{ij_0}}{s_{j_0}}p^{(2)}_{i_0j}(0) - p^{(3)}_{ij}(0)\frac{\delta _{ii_0}+\delta _{ij_0}}{s_i} \nonumber \\&\quad + \frac{p_{ii_0}(0)p_{j_0j}(0)}{s_{i_0}} + \frac{p_{ij_0}(0)p_{i_0j}(0)}{s_{j_0}} - \frac{p_{ii_0}(0)p_{i_0j}^{(2)}(0)}{s_{i_0}} - \frac{p_{ij_0}(0)p_{j_0j}^{(2)}(0)}{s_{j_0}} \nonumber \\&\quad + \frac{\delta _{jj_0}}{s_{i_0}}p^{(2)}_{ii_0}(0) + \frac{\delta _{ji_0}}{s_{j_0}}p^{(2)}_{ij_0}(0) - p^{(2)}_{ii_0}(0)p_{i_0j}(0) \frac{1}{s_{i_0}} - p^{(2)}_{ij_0}(0)p_{j_0j}(0) \frac{1}{s_{j_0}}. \end{aligned}$$
(23)

We show the derivation of Eqs. (21), (22), and (23) in Appendix B.

We next calculate \(T(\varepsilon )\). Matrix \(T(0) = (t_{ij}(0))\) satisfies

$$\begin{aligned} t_{ij}(0) = {\left\{ \begin{array}{ll} 0 &{} \textrm{if}\ i = j, \\ 1 + \frac{1}{2} \left[ \sum _{k=1}^{j-1} p_{ik}(0)t_{kj}(0) + \sum _{k=j+1}^{N} p_{ik}(0)t_{jk}(0) \right. \\ \left. + \sum _{k=1}^{i-1} p_{jk}(0)t_{ki}(0) + \sum _{k=i+1}^{N} p_{jk}(0)t_{ik}(0) \right] &{} \textrm{if}\ i < j, \\ t_{ji}(0) &{} \textrm{if}\ i > j, \end{array}\right. } \end{aligned}$$
(24)

which we obtain by applying \(t_{ij}(0) = t_{ji}(0)\) to Eq. (7). Note that \(\{p_{11}(0), p_{12}(0), \ldots , p_{NN}(0) \}\) are known from the network structure and that \(\{t_{11}(0), t_{12}(0), \ldots , t_{NN}(0) \}\) are unknowns. We stack Eq. (24) for the different i and j values in lexicographical order of (ij) on the left-hand side. In other words, the first equation is \(t_{11}(0) = 0\), the second equation is \(t_{12}(0) - \frac{1}{2} p_{11}(0)t_{12}(0) - \frac{1}{2}\sum _{k=3}^{N} p_{1k}(0)t_{2k}(0) - \frac{1}{2} \sum _{k=2}^{N} p_{2k}(0)t_{1k}(0) = 1\), the third equation is \(t_{13}(0) - \frac{1}{2} p_{11}(0)t_{13}(0) - \frac{1}{2} p_{12}(0)t_{23}(0) - \frac{1}{2} \sum _{k=4}^{N} p_{1k}(0)t_{3k}(0) - \frac{1}{2} \sum _{k=2}^{N} p_{3k}(0)t_{1k}(0) = 1\), and so on. Denote by \(\textrm{vec}(T(0))\) the thus obtained vectorization of matrix T(0), i.e.,

$$\begin{aligned} \textrm{vec}(T(0)) = (t_{11}(0), \ldots , t_{1N}(0); t_{21}(0), \ldots , t_{2N}(0); \ldots , t_{N1}(0), \ldots , t_{NN}(0))^\top . \end{aligned}$$
(25)

Equation (25) is a redundant expression because T(0) is a symmetric matrix and its diagonal elements are equal to 0. However, we use Eq. (25) in the following text because it makes the theoretical derivations and computational implementation easier than the most compact vector form of T(0), which would be \(N(N-1)/2\)-dimensional. Using Eq. (25), we rewrite Eq. (24) as

$$\begin{aligned} M(0)\textrm{vec}(T(0)) = {\varvec{d}}, \end{aligned}$$
(26)

where M(0) is the \(N^2\times N^2\) matrix whose entries are determined by Eq. (24), and \({\varvec{d}}\) is the \(N^2\)-dimensional column vector whose \(((k-1)N+k)\)th entry is equal to 0 for all \(k\in \{1,\ldots N\}\), and all the other entries are equal to 1. Because it also holds true that \(M(\varepsilon ) \textrm{vec}(T(\varepsilon )) = {\varvec{d}}\), the calculation of \(T(\varepsilon )\) requires \(M(\varepsilon )\), which is the matrix with perturbation, defined similarly to M(0). We obtain the entries of \(M(\varepsilon )\) by those of M(0) with each \(p_{ij}(0)\) (with \(i, j \in \{1, \ldots , N\}\)) being replaced by \(p_{ij}(\varepsilon )\). We write the Taylor expansion of \(M(\varepsilon )\) as

$$\begin{aligned} M(\varepsilon ) = M(0) + \varepsilon \Delta M + o(\varepsilon ) \end{aligned}$$
(27)

and calculate \(\Delta M\) as follows.

We write \(\Delta M\) as a block matrix

$$\begin{aligned} \Delta M = \begin{pmatrix} \Delta _{11} &{} \Delta _{12} &{} \cdots &{} \Delta _{1N}\\ \Delta _{21} &{} \Delta _{22} &{} \cdots &{} \Delta _{2N}\\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ \Delta _{N1} &{} \Delta _{N2} &{} \cdots &{} \Delta _{NN} \end{pmatrix}, \end{aligned}$$
(28)

where each \(\Delta _{ij}\) is an \(N\times N\) matrix. We show the derivation of each \(\Delta _{ij}\) in Appendix C. We point out that the number of nonzero rows of \(\Delta M\) is equal to \(2\times (N-2) + (N-1) \times 2 = 4N-6\), which is much smaller than \(N^2\) for a large N (see Appendix C).

To derive the first-order term of \(T(\varepsilon )\) from \(\Delta M\), we use Eq. (27) to obtain

$$\begin{aligned} \textrm{vec}(T(\varepsilon ))&= M(\varepsilon )^{-1} {\varvec{d}} \nonumber \\&= (M(0) + \Delta M)^{-1} {\varvec{d}} \nonumber \\&= \left[ M(0) (I + \varepsilon M(0)^{-1} \Delta M + o(\varepsilon )) \right] ^{-1} {\varvec{d}} \nonumber \\&= \left[ I - \varepsilon M(0)^{-1} \Delta M + o(\varepsilon ) \right] M(0)^{-1} {\varvec{d}} \nonumber \\&= \left( I - \varepsilon M(0)^{-1} \Delta M \right) \textrm{vec}(T(0)) + o(\varepsilon ) \nonumber \\&= \textrm{vec}(T(0)) - \varepsilon M(0)^{-1} \Delta M \textrm{vec}(T(0)) + o(\varepsilon ). \end{aligned}$$
(29)

Therefore, we obtain

$$\begin{aligned} T(\varepsilon ) := T(0) + \varepsilon \Delta T + o(\varepsilon ), \end{aligned}$$
(30)

where \(\Delta T\) is the \(N \times N\) matrix satisfying

$$\begin{aligned} \textrm{vec}(\Delta T) = - M(0)^{-1} \Delta M \textrm{vec}(T(0)). \end{aligned}$$
(31)

Finally, using Eq. (13), we derive the perturbed \(\tau _k(\varepsilon )\) as follows:

$$\begin{aligned} \tau _k(\varepsilon ) = \tau _k(0) + \varepsilon \Gamma _k + o(\varepsilon ), \end{aligned}$$
(32)

where

$$\begin{aligned} \Gamma _k = \Delta \varvec{\pi }(P^k(0) \circ T(0)) + \varvec{\pi }(0) (\Theta ^{(k)} \circ T(0)) + \varvec{\pi }(0) (P^k(0) \circ \Delta T). \end{aligned}$$
(33)

By substituting Eq. (32) in Eq. (8), we obtain

$$\begin{aligned} \left( \frac{b}{c}\right) ^*(\varepsilon )&:= \left( \frac{b}{c}\right) ^*(0) + \varepsilon \Delta \left( \frac{b}{c}\right) ^* + o(\varepsilon ), \end{aligned}$$
(34)

where

$$\begin{aligned} \Delta \left( \frac{b}{c}\right) ^* = \frac{(\tau _3(0)-\tau _1(0))\Gamma _2-\tau _2(0)(\Gamma _3-\Gamma _1)}{(\tau _3(0)-\tau _1(0))^2}. \end{aligned}$$
(35)

4 Time complexity

To calculate \((b/c)^*\) for a network with N nodes, the original algorithm requires calculating the mean coalescence time by solving a linear system of \(N(N-1)/2\) variables, i.e., \(t_{ij}\) (with \(i, j \in \{1, \ldots , N \}\) and \(i < j)\), which has a time complexity of \(O(N^6)\). With the Coppersmith-Winograd algorithm (Coppersmith and Winograd 1987), the time complexity is reduced to \(O(N^{4.75})\) (Allen et al. 2017). To determine the single edge whose removal decreases \((b/c)^*\) by the largest amount, for example, one needs to repeat this procedure for each edge. Therefore, the entire procedure with an ordinary algorithm and the Coppersmith-Winograd algorithm requires \(O(N^6|E|)\) and \(O(N^{4.75}|E|)\) time, respectively, where |E| is the number of edges. For a sparse network, for which \(|E| = O(N)\), the time complexity is \(O(N^7)\) and \(O(N^{5.75})\), respectively.

The matrix \(\Delta M\) defined by Eq. (28) is sparse and has a special pattern. If the ith row of \(\Delta M\) is a zero row, then the ith element of vector \(\Delta M \textrm{vec}(T(0))\) is zero, and we do not need to calculate it. Therefore, to calculate \(\Delta M \textrm{vec}(T(0))\), we only need to focus on its \(((i_0-1)N+k)\)th entries, where \(k\in \{1, \ldots , N\} \setminus \{i_0\}\), \(((j_0-1)N+k)\)th entries, where \(k \in \{1, \ldots , N \} \setminus \{j_0 \}\), and \(((k-1)N+i_0)\)th and \(((k-1)N+j_0)\)th entries, where \(k\in \{1,\ldots ,N\} \setminus \{i_0, j_0\}\). All the other entries of \(\Delta M \textrm{vec}(T(0))\) are equal to 0. We show a pseudo algorithm to calculate \(\Delta T\) in Algorithm 1.

figure a

We now discuss the computational complexity of our perturbation method. Because the inner product of N-dimensional vectors has a time complexity of O(N), the first while loop in Algorithm 1 has a complexity of \(O(N^2)\). The second while loop computes \(\textrm{vec}(\Delta T)\). Because the scalar multiplication of an \(N^2\)-dimensional vector requires \(O(N^2)\) time, the entire while loop has a time complexity of \(O(N^3)\). Therefore, for a single perturbation experiment, one can carry out the entire algorithm in \(O(N^3)\) time to obtain the perturbed \(\{t_{ij}\}\), and hence \((b/c)^*\). This is considerably smaller than \(O(N^{4.75})\) and \(O(N^6)\) with the Coppersmith-Winograd algorithm and the standard algorithm, respectively. The entire procedure to determine the single edge to be removed to maximize cooperation with the perturbation theory requires \(O(N^3 |E|)\) time in general networks and \(O(N^4)\) time for sparse networks.

5 Data

We use the following four synthetic networks and seven empirical networks in our numerical analysis in Sect. 6. We show the number of nodes and that of edges for each network in Table 1 and visualize them in Fig. 2. All the networks are connected networks without self-loops.

Table 1 Pearson correlation coefficient, r, between the shift in \((b/c)^*\) obtained by direct numerical simulations and that predicted by the perturbation theory. We remind that N is the number of nodes and that |E| is the number of edges. A large positive value of r upon edge addition or enhancement implies that the perturbation theory is good at predicting the outcome of adding or enhancing an edge. A large negative value of r upon edge removal implies that the perturbation theory is good at predicting the outcome of removing an edge

We use a network generated by the Erdős-Rényi (ER) random graph with \(N=100\) nodes. We connect 300 pairs of nodes out of the \(N(N-1)/2 = 4950\) pairs of nodes selected uniformly at random. The average degree \(\langle k\rangle = 6\).

With the Barabási-Albert (BA) model, we sequentially add new nodes each with \(m=3\) edges that connect to existing nodes according to the linear preferential attachment rule (Barabási et al. 1999). We start the growth process from the star graph with four nodes. The degree distribution approximately obeys \(p(k)\propto k^{-3}\), where p(k) is the probability that a node has degree of k, and \(\propto \) represents “proportional to”, in the limit of \(N\rightarrow \infty \). We set \(N=100\) and \(m=3\), which yields 291 edges, implying \(\langle k\rangle = 5.82 \approx 6\).

The planted \(\ell \)-partition model, also called the random partition (RP) graph, partitions the set of N nodes into \(\ell \) groups, each of which has \(N/\ell \) nodes (Fortunato 2010). Any pair of nodes in the same group is adjacent to each other with probability \(p_{\textrm{in}}\). Any pair of nodes belonging to different groups are adjacent to each other with probability \(p_{\textrm{out}}\). If \(p_{\textrm{in}}>p_{\textrm{out}}\), the intra-cluster edge density exceeds the inter-cluster edge density such that the network has community structure. We set \(N = 100\), \(\ell = 2\), \(p_{\textrm{in}}=0.11\), and \(p_{\textrm{out}}=0.01\) such that the mean degree \(\langle k\rangle = p_{\textrm{in}}(N/\ell - 1) + p_{\textrm{out}}N(\ell -1)/\ell = 5.89\) in theory. We use a network generated by this model having \(\langle k\rangle = 6.12\).

The Lancichinetti-Fortunato-Radicchi (LFR) model generates networks with community structure (Lancichinetti et al. 2008). The model generates a power-law degree distribution with power-law exponent \(\gamma \), and a power-law distribution of the size of the community with power-law exponent \(\kappa \). The model also requires the maximal degree \(k_{\textrm{max}}\) and mean degree \(\langle k \rangle \) as input. The mixing parameter \({\overline{\mu }} \in (0,1)\) specifies the fraction of edges that connect different communities. A small value of \({\overline{\mu }}\) leads to strong community structure. We set \(N=100\), \(\gamma =3\), \(\kappa =2\), \(\langle k \rangle = 6\), \(k_{\textrm{max}}=100\), and \({\overline{\mu }}=0.1\). A network generated by this model that we use has \(\langle k\rangle = 6.08\).

Fig. 2
figure 2

Visualization of the networks used in the numerical analysis. a ER random graph. b BA model. c Planted 2-partition model. d LFR model. e Karate club. f Weaver. g Sparrow. h Lizard. i Dolphin. j Email. k Bird. All the networks are undirected. The linear size of the node is proportional to its degree. We have ignored the weight of the edge in this figure and our analysis

We consider the following seven empirical networks. The karate club network consists of 34 nodes and 78 edges (Zachary 1977). Each node represents a member of a karate club in a university in the United States, who were observed between 1970 and 1972. The edges represent interaction outside the activities of the club.

The weaver network has 42 nodes and 151 edges (van Dijk et al. 2014). Each node represents a sociable weaver (Philetairus socius) observed in Benfontein Game Farm, Kimberley, South Africa. The observation lasted for 10 months in total: September-December 2010 and 2011, and January-February 2013. Two nodes are adjacent to each other if the two weavers used the same nest chambers either for roosting or nest-building within a series of observations in the same year.

The sparrow network has 52 nodes and 516 edges (Arnberg et al. 2015). A node represents a golden-crowned sparrow (Zonotrichia atricapilla) observed at the University of California, Santa Cruz Arboretum. The data was recorded between January and March 2010 (Arnberg et al. 2015). Although the original network is weighted, we regard this network as an unweighted network.

The lizard network has 60 nodes and 318 edges (Bull et al. 2012). Each node represents a lizard (Tiliqua rugosa) observed in a chenopod shrubland near Bundey Bore Station in South Australia. Each lizard was attached to the dorsal surface of the tail a data logger unit, which recorded synchronized GPS locations every 10 minutes. Two lizards were regarded to be adjacent to each other if they were within 2 meters of each other in any GPS record.

The dolphin network has 62 nodes and 159 edges (Lusseau et al. 2003). Each node represents a bottlenose dolphin (Tursiops). An edge represents a frequent association between two dolphins.

The email network has 167 nodes and 3251 edges (Michalski et al. 2011). Each node represents an employee of a mid-sized manufacturing company in Poland. An edge between two nodes (i.e., employees) indicates that there exists at least one email correspondence between the two individuals. We do not distinguish the senders and the recipients and treat the network as undirected network.

The bird network has 202 nodes and 11900 edges (Firth and Sheldon 2015). In the experiment, they placed some nest boxes in Wytham Woods, Oxford, UK, for six days to record individuals that landed on the entrance hole while prospecting for breeding territories. Each node represents a wild bird, which is either great tit (Parus major), blue tit (Cyanistes caeruleus), marsh tit (Poecile palustris), coal tit (Periparus ater), or Eurasian nuthatch (Sitta europaea). An edge represents two birds that overlapped in nest-box exploration patterns on the same day.

6 Numerical results

6.1 Addition or removal of a single edge

We examine the accuracy at which our perturbation theory describes the change in \((b/c)^*\) when we add or remove an edge in the given unweighted network. Before the perturbation, \(w_{ij} = w_{ji} = 1\) if there exists an edge between the ith and jth nodes, and \(w_{ij} = w_{ji} = 0\) otherwise. In the case of edge addition, we add an edge with weight \(\varepsilon \) between a pair of nodes \((i_0, j_0)\) without an edge in the original network unless we state otherwise, where \(0 < \varepsilon \le 1\). Therefore, \(w_{i_0 j_0} (= w_{j_0 i_0})\) changes from 0 to \(\varepsilon \), and all the other \(w_{ij} \in \{0, 1 \}\) values remain unchanged. The addition of an unweighted edge corresponds to \(\varepsilon =1\). In the case of edge removal, we reduce the weight of an edge \((i_0, j_0)\) in the original network by \(-\varepsilon \), where \(-1 \le \varepsilon < 0\). Therefore, \(w_{i_0 j_0} (= w_{j_0 i_0})\) changes from 1 to \(1 + \varepsilon \), and all the other \(w_{ij}\) values remain unchanged. The complete removal of an unweighted edge corresponds to \(\varepsilon = -1\).

We are interested in whether the linear approximation to \((b/c)^*(\varepsilon )\) given by Eq. (34), i.e., \(\Delta (b/c)^*\), which we call the slope, predicts the change in \((b/c)^*\) in response to the addition of a single edge, i.e., \((b/c)^*(1)-(b/c)^*(0)\), or the removal of a single edge, i.e., \((b/c)^*(-1)-(b/c)^*(0)\). We start by directly computing the change in \((b/c)^*\), i.e., \((b/c)^*(\varepsilon )-(b/c)^*(0)\), in response to adding a new edge of weight \(\varepsilon \) \((>0)\) or reducing the weight of an existing edge by changing the edge weight to \(1+\varepsilon \) \((<1)\) for various values of \(\varepsilon \) for relatively small networks. The outcome of our perturbation theory, i.e., \(\Delta (b/c)^*\) is equal to \(\lim _{\varepsilon \rightarrow 0}\left[ (b/c)^*(\varepsilon ) - (b/c)^*(0)\right] /\varepsilon \), where \((b/c)^*(0)\) and \((b/c)^*(\varepsilon )\) are the values obtained by the direct numerical simulations.

We show the relationship between \((b/c)^*(\varepsilon ) - (b/c)^*(0)\) and \(\varepsilon \) when we reduce the weight of a single edge in a BA network with \(N=100\) nodes in Fig. 3a. Each line in the figure corresponds to an edge whose weight is gradually reduced. Note that \(\varepsilon = 0\) corresponds to the original network. Figure 3a indicates that \((b/c)^*\) roughly monotonically decreases as we gradually decrease the edge weight (i.e., decrease \(\varepsilon \) from 0 to negative values) except near \(\varepsilon = 0\). For this network, the removal of any single edge (i.e., \(\varepsilon =-1\)) leads to a decrease in \((b/c)^*\), implying that the edge removal promotes cooperation. However, we note that a small decrease in the weight of an edge in the original network (e.g., \(\varepsilon = -0.3\)) increases \((b/c)^*\) for some edges, making cooperation more difficult than in the original network. Figure 3a implies that the perturbation theory is not accurate at describing the amount of the change in \((b/c)^*\) upon the edge removal because most of the curves shown in the figures, corresponding to the different edges in the original network, are far from being linear. However, we observe that the curves with the largest values of the slope of the curve at \(\varepsilon = 0\) tend to yield the smallest values of \((b/c)^*\) at \(\varepsilon = -1\). Therefore, the perturbation theory, which produces the slope value, is expected to be efficient at detecting the edges whose removal yields the largest decrease in \((b/c)^*\).

We show in Fig. 3b the change in \((b/c)^*\) plotted against \(\varepsilon \) when we add a new edge with weight \(\varepsilon \). Each line corresponds to a pair of nodes between which there is initially no edge. Note that \(\varepsilon =1\) corresponds to the addition of an unweighted edge. We find that the addition of any unweighted edge increases \((b/c)^*\), making cooperation difficult. However, in contrast to the case of edge removal, the addition of an unweighted edge (i.e., with edge weight \(\varepsilon = 1\)) does not necessarily yield the largest change in \((b/c)^*\) among edges of different weights \(\varepsilon \in (0, 1]\). Specifically, for many node pairs that are initially not adjacent to each other, adding an edge with an intermediate edge weight (e.g., \(\varepsilon \approx 0.7\)) maximizes the increase in \((b/c)^*\) (see Fig. 3b). Another observation is that the slope of the curve at \(\varepsilon = 0\), corresponding to the perturbation theory, is apparently less predictive of the effect of adding an unweighted edge (i.e., \(\varepsilon = 1\)). Specifically, Fig. 3b indicates that, even if the slope at \(\varepsilon = 0\) is large, \((b/c)^*\) at \(\varepsilon = 1\) can be relatively small because \((b/c)^*\) decreases as \(\varepsilon \) increases when \(\varepsilon \) is close to 1. Furthermore, the curves with the largest slopes at \(\varepsilon = 0\) do not yield the largest changes in the \((b/c)^*\) value at \(\varepsilon = 1\), which implies that the perturbation theory is expected to be inefficient at predicting the edge addition that makes the cooperation most difficult.

We find similar results for the planted 2-partition model for the gradual removal of a single edge (see Fig. 3c). A notable difference from the case of the BA model is that there exists one edge whose complete removal increases \((b/c)^*\), making the cooperation difficult. The two nodes forming this edge have degrees 2 and 9, which are not outstanding. Furthermore, we have confirmed by running a deterministic approximate modularity maximization algorithm (Clauset et al. 2004), using function greedy_modularity_communities in NetworkX, that these two nodes belong to the same community among the four communities detected. Therefore, this particular edge looks like just a normal edge.

We show in Fig. 3d the dependence of \((b/c)^*\) on \(\varepsilon \) when we gradually increase the weight of an edge that is initially absent in the planted 2-partition network. The slope of the curve at \((b/c)^*\) at \(\varepsilon = 0\) is apparently not strongly related to the change in \((b/c)^*\) at \(\varepsilon = 1\).

We show the results of edge removal in the dolphin network in Fig. 3e. There are two edges out of the 150 edges of which the removal (i.e., \(\varepsilon = -1\)) increases \((b/c)^*\), making cooperation difficult. These two edges are formed by two nodes with degrees 2 and 5 and two other ones with degrees 2 and 7. These degree values are not outstanding in the entire network. The four nodes belong to the same community among the four communities detected by the same approximate modularity maximization algorithm (Clauset et al. 2004). These results suggest that the two edges apparently look normal. The removal of any other edge decreases \((b/c)^*\), enhancing cooperation. Similar to the BA model, the curves with the largest slopes at \(\varepsilon = 0\) yield the largest decreases in \((b/c)^*\) at \(\varepsilon = -1\).

Fig. 3
figure 3

Change in \((b/c)^*\) as a function of the change in the edge weight, \(\varepsilon \). a BA model, removal of an existing edge. b BA model, addition of a new edge. c Planted 2-partition model, removal of an existing edge. d Planted 2-partition model, addition of a new edge. e Dolphin network, removal of an existing edge. f Dolphin network, addition of a new edge. In a, c, and e, each line represents an edge in the original network. In b, d, and f, each line represents a pair of nodes that is not adjacent to each other in the original network. The line color is only as a guide to the eyes

Fig. 4
figure 4

Change in \((b/c)^*\) when we remove or add an unweighted edge as a function of the slope \(\Delta (b/c)^*\) of the curves shown in Fig. 3 at \(\varepsilon = 0\). a BA model, removal of an existing edge. b BA model, addition of a new edge. c Planted 2-partition model, removal of an existing edge. d Planted 2-partition model, addition of a new edge. e Dolphin network, removal of an existing edge. f Dolphin network, addition of a new edge. Each circle in a, c, and e represents an edge in the original network. Each circle in b, d, and f represents a pair of nodes that is not adjacent to each other in the original network

We show in Fig. 3f the dependence of \((b/c)^*\) on \(\varepsilon \) when we gradually increase the weight of an edge that is initially absent in the dolphin network. The results are similar to those for the planted 2-partition model shown in Fig. 3d. Many curves yield decrease in \((b/c)^*\) at \(\varepsilon = 1\), implying that the edge addition can promote cooperation, whereas the converse is the case for many other curves. The slope of the curve of \((b/c)^*\) at \(\varepsilon = 0\) is apparently not strongly related to the change in \((b/c)^*\) at \(\varepsilon = 1\).

The nonlinearity in the curves shown in Fig. 3 indicates that our perturbation theory is not accurate at predicting the amount of change in \((b/c)^*\) when we completely remove or add an edge in most cases. Therefore, we turn to ask whether the slope obtained from the perturbation theory is useful at determining the edge whose removal or addition changes \((b/c)^*\) by a large amount, representing strong promotion or suppression of cooperation in networks. We show in Fig. 4a the relationship between the change in \((b/c)^*\) when we remove an edge from the BA network and the slope \(\Delta (b/c)^*\) obtained from Eq. (34). The two quantities are strongly negatively correlated (Pearson correlation coefficient \(r=-0.86\), sample size \(n = 291, p<0.01\)). This result indicates that the perturbation theory, which is theoretically accurate only in the vicinity of \(\varepsilon = 0\), is good at predicting the outcome of removing an edge. We show in Fig. 4b the change in \((b/c)^*\) when we add a new edge to the same BA network as a function of the slope, \(\Delta (b/c)^*\). The change in \((b/c)^*\) is not strongly positively correlated with \(\Delta (b/c)^*\), suggesting that the perturbation theory is not good at predicting the outcome of adding an edge, whereas the correlation coefficient is significant due to a large sample size (\(r=-0.36, n = 4659, p<0.01\)). Note that a large positive correlation coefficient when we add an edge would imply that the perturbation theory is good at predicting the outcome of adding an edge.

We show in Fig. 4c and d the results for the same correlation analysis for the planted 2-partition model network. When one removes an existing edge, the change in \((b/c)^*\) and slope \(\Delta (b/c)^*\) are strongly negatively correlated (\(r=-0.80, n = 306, p <0.01\); see Fig. 4c), which is similar to the result for the BA model shown in Fig. 4a, suggesting that the perturbation theory is good at predicting the outcome of removing an edge. When one adds a new edge, the change in \((b/c)^*\) and slope \(\Delta (b/c)^*\) are weakly correlated for this network (\(r=-0.39, n=4644, p<0.01\); see Fig. 4d), which is similar to the result for the BA model shown in Fig. 4b.

We show the corresponding results for the dolphin network in Fig. 4e and f. The change in \((b/c)^*\) and slope \(\Delta (b/c)^*\) are strongly negatively correlated when one removes an edge (\(r=-0.72, n=150, p<0.01\); see Fig. 4e) and less strongly correlated when one adds a new edge (\(r=0.56, n=1732, p<0.01\); see Fig. 4f). A strongly negative correlation for the edge removal (i.e., \(r=-0.72\)) is similar to the result for the BA model. A positive correlation for the edge addition (i.e., \(r=0.56\)) implies that the perturbation theory is to some extent good at predicting the outcome of adding an edge.

We show in Table 1 the same relationships for the other networks. For all synthetic and empirical networks, the slope \(\Delta (b/c)^*\) obtained from perturbation theory is strongly negatively correlated with the change in \((b/c)^*\) when we remove an existing edge (\(r\le -0.72\)). Therefore, the perturbation theory is effective at predicting the outcome of removing an edge across different networks. However, the correlation is strongly positive only for a small fraction of networks (i.e., \(r\ge 0.5\) for three out of the nine networks) when we add a new edge to the network.

6.2 Enhancement of the weight of an existing edge

In this section, we allow weighted networks and consider an increase or decrease in the weight of an existing edge of the network. Because we effectively analyzed the case of the decrease in the edge weight in Sect. 6.1 (i.e., by setting \(-1< \varepsilon < 0\)), here we only consider enhancement of the weight of an existing edge by \(\varepsilon \).

We enhanced the weight of an existing edge by \(0<\varepsilon \le 1\), making the edge weight \(1+\varepsilon \), and numerically examined \((b/c)^*\) in the altered weighted networks. We plot in Fig. 5a, c, and e the change in \((b/c)^*\) relative to the original network against \(\varepsilon \) for the three networks used in Figs. 3 and 4. For the BA model, increasing the weight of 74 out of the 291 existing edges from 1 to 2 (i.e., \(\varepsilon = 1\)) led to an increase in \((b/c)^*\), making cooperation more difficult, whereas the opposite is the case when one enhances the weight of any other edge (see Fig. 5a). This result contrasts with the case of adding a new edge to the same network, which always increases \((b/c)^*\) (see Fig. 3b). In the planted 2-partition model (Fig. 5c) and the dolphin network (Fig. 5e), enhancing the edge weight of 7 out of the 306 edges and 23 out of the 159 edges, respectively, led to an increase in \((b/c)^*\). Therefore, in a majority of cases, cooperation becomes easier by enhancing the weight of a single edge, which contrasts with the results for adding a new edge to these networks (see Fig. 3d and f). These results altogether suggest that adding new edges and enhancing the weight of existing edges often lead to different results.

Figure 5a, c, and e also indicates that the change in \((b/c)^*\) is close to linear as a function of \(\varepsilon \). Therefore, our perturbation theory should be accurate at estimating the change in \((b/c)^*\) with \(\varepsilon =1\). To verify this prediction, we show in Fig. 5b, d and f the relationship between the change in \((b/c)^*\) in response to changing the weight of a single edge from 1 to 2 and \(\Delta (b/c)^*\) obtained by the perturbation theory for the three networks. As expected, the accuracy of the perturbation theory is high. We have confirmed that a high accuracy also holds true for other networks (see the last column of Table 1). These high accuracy results are in stark contrast to the results in case of adding a new edge, with which the accuracy of the perturbation theory is low.

Fig. 5
figure 5

Change in \((b/c)^*\) when one enhances the weight of an existing edge. Panels a, c, e: Change in \((b/c)^*\) as a function of the increase in the weight of an existing edge, \(\varepsilon \). Each line represents an edge in the original network. Panels b, d, f: Change in \((b/c)^*\) when we enhance an edge weight by \(\varepsilon =1\), plotted against the slope \(\Delta (b/c)^*\) of the curves shown in panels a, c, and e at \(\varepsilon = 0\). Each circle represents an edge in the original network. a and b: BA model. c and d: Planted 2-partition model. e and f: Dolphin network

6.3 Sequential edge removal

The nonlinearity in the curves shown in Fig. 3, and the results shown in Fig. 4 and Table 1 indicate that our perturbation theory is not accurate at estimating the amount of change in \((b/c)^*\) upon an edge removal. Therefore, we turn to investigate whether our perturbation theory is good at finding edges to be sequentially removed to decrease \((b/c)^*\) by a large amount in larger networks. Denote by \(G_0\) an original network. We remove the edge with the largest \(\Delta (b/c)^*\), resulting in network \(G_1\). Then, we calculate \(\Delta (b/c)^*\) for each existing edge in \(G_1\) and remove the edge with the largest \(\Delta (b/c)^*\), resulting in network \(G_2\). We repeat this procedure another three times to eventually obtain network \(G_5\), which has five fewer edges than \(G_0\).

A simple rule of thumb to determine edges to be removed to enhance cooperation is to use the degree of nodes composing the edge. In particular, \((b/c)^*\) for the death-birth rule is small for random regular graphs with small degrees (Ohtsuki et al. 2006) and general networks with a small mean degree (Allen et al. 2017). Therefore, we test the performance of our perturbation theory against a degree-based heuristic to remove an edge for enhancing cooperation, which we define as follows. Denote by (ij) the edge to be removed and by \(k_i\) and \(k_j\) the degree of the ith and jth nodes, respectively. Note that \(k_i = \sum _{\ell =1}^N w_{i\ell } (= \sum _{\ell =1}^N w_{\ell i})\) for our networks, which are unweighted. For each network, we remove the edge whose \(k_i+k_j\) is largest. After removing an edge according to this criterion, we select the edge with the largest \(k_i + k_j\) in the reduced network and remove it. We repeat this procedure another three times to remove five edges in total. In our numerical experiments described below, we have verified that the selected edges are always the same if the score for the edge is defined by \(k_i k_j\) instead of \(k_i + k_j\).

We carry out sequential edge removal experiments on three synthetic networks and three empirical networks. Note that the six networks are mostly larger than those used in the previous numerical simulations. For these networks, it is computationally difficult to exactly calculate \((b/c)^*\) for all possible networks with, for example, one edge being removed from the original network.

We show the change in \((b/c)^*\) relative to the original network as we sequentially remove five edges using our perturbation theory by the red lines in Fig. 6. As expected, \((b/c)^*\) decreases, corresponding to negative \(\Delta (b/c)^*\) values, as we remove edges one by one. We also show the result of the sequential edge removal based on the degree sum \(k_i+k_j\) by the blue lines in the same figure. For all networks, there are multiple edges that have the same value of \(k_i + k_j\) at least in one of the five steps to remove a single edge. In this case, we calculated \(\Delta (b/c)^*\) for all the possible scenarios of removing one of the edges that maximize \(k_i + k_j\) in each step of edge removal. This is why we have obtained multiple blue lines in the figure. In all cases, \((b/c)^*\) decreases as we sequentially remove edges with the largest \(k_i + k_j\) value. Figure 6 indicates that the edge removal based on our perturbation theory results in a larger decrease in \((b/c)^*\) than that based on \(k_i + k_j\) for all the networks. To be quantitative, we measured the decrease in \((b/c)^*\) after the removal of five edges compared to the original network with the perturbation theory and with the degree sum. The former was larger than the average of the latter (i.e., average of the blue lines in Fig. 6) by a factor of 1.02, 1.01, 1.02, 1.05, 1.02, and 1.02 for the ER random graph (Fig. 6a), BA model (Fig. 6b), planted 2-partition network (Fig. 6c), lizard network (Fig. 6d), email network (Fig. 6e), and bird network (Fig. 6f), respectively.

Fig. 6
figure 6

Changes in \((b/c)^*\) upon sequential removal of five edges. a ER random graph with 300 nodes and 900 edges. b BA model network with 300 nodes and 891 edges. c Planted 2-partition network with 300 nodes and 939 edges. d Lizard network with 60 nodes and 318 edges. e Email network with 167 nodes and 3251 edges. f Bird network with 202 nodes and 11900 edges. The red lines represent the edge removal according to the perturbation theory. The blue lines represent the edge removal according to the rank of the degree sum

7 Conclusions

To determine \((b/c)^*\) for an arbitrary network, one needs to solve a system of \(N^2\) linear equations such that the time complexity is \(O(N^6)\). With the Coppersmith-Winograd algorithm, the time complexity is reduced to \(O(N^{4.75})\), but this is still large (see Sect. 4). In particular, it is computationally costly to carry out graph surgery with various possible edges to be added or removed to compare the results in terms of \((b/c)^*\). Therefore, we have developed a perturbation theory for the graph surgery with which we can evaluate the perturbed \((b/c)^*\) in \(O(N^3)\) time. We have verified that the first-order term \(\Delta (b/c)^*\) obtained from our perturbation theory predicts the rank of the change in \((b/c)^*\) when one removes an edge from the network with a high accuracy. Specifically, we have numerically shown that the edge with the largest \(\Delta (b/c)^*\) value is the one whose actual removal decreases \((b/c)^*\) by the largest amount in two out of the three networks (see Fig. 4a, c, and e). Therefore, we conclude that our perturbation theory is useful for finding the edge whose removal efficiently enhances cooperation in the given network with a reduced computational cost.

We focused on the death-birth process because it tends to foster cooperation compared to other rules of strategy updating (Ohtsuki et al. 2006; Szabó and Fath 2007). However, it is straightforward to formulate similar perturbation methods in the case of other updating rules such as the birth-death process (Lieberman et al. 2005; Ohtsuki et al. 2006) and the pairwise comparison rule (Blume 1993; Szabó and Tőke 1998; Nowak et al. 2004; Traulsen et al. 2006) as well as in the case of other payoff matrices. In particular, our theory should be applicable to the case of constant selection (Lieberman et al. 2005; Allen et al. 2021), with which the payoff matrix is independent of the opponent’s action. The perturbation theory may be more accurate for other update rules or games than the combination of the death-birth rule and the prisoner’s dilemma game examined in the present study. Exploitation of our perturbation approach in these directions is left for future work.

Another direction of future work is interaction between the selection strength and network perturbation. In the present work, we have assumed the weak selection limit. However, one can retain a selection strength parameter (which is \(\eta \) in this article) to be finite and write down a formal solution. Then, it may be interesting to consider the simultaneous limit of weak selection \(\eta \rightarrow 0\) and weak network perturbation \(\varepsilon \rightarrow 0\) in a way \(\eta \) and \(\varepsilon \) are interrelated. Apart from this research direction, assessing the validity of the present perturbation theory under strong selection is left for future work. To this end, we first need to understand the accuracy of the original theory of fixation of cooperation in networks (Allen et al. 2017), which our theory is based on, under strong selection.

We do not know why the perturbation theory is more accurate when one removes an edge than when one adds an edge. Furthermore, we have found that the perturbation theory is fairly accurate at predicting the result for adding a parallel edge where an edge already exists, whereas it is not accurate when adding an edge where an edge does not exist in the original network. In a related vein, we observed nonmonotonic behavior in the cooperativity in terms of \((b/c)^*\) especially when we gradually added a weighted edge (Fig. 3b and f). These results lead us to hypothesize that we can engineer networks that promote cooperation better by considering weighted networks than unweighted networks. These topics also warrant future work.