Policy choices and compliance behavior in pandemic times

In this paper, we model an evolutionary noncooperative game between politicians and citizens that, given the level of infection, describes the observed variety of mitigation policies and citizens’ compliance during the COVID-19 pandemic period. Our results show that different stable equilibria exist and that different ways/paths exist to reach these equilibria may be present, depending on the choice of parameters. When the parameters are chosen opportunistically, in the short run, our model generates transitions between hard and soft policy measures to deal with the pandemic. In the long-run, convergence is achieved toward one of the possible stable steady states (obey or not obey lockdown rules) as functions of politicians’ and citizens’ incentives.


Introduction
The outbreak of the COVID-19 pandemic prompted various, and often contradicting, mitigation responses in countries around the world (see Fig. 1). Countries with similar levels of infection rates, similar demographics and geographic characteristics often implemented very different policies to manage the pandemic [see, for example, Helsingen et al. (2020) for a comparison of the response of Sweden and Norway to the pandemic, and also Gordon et al. (2021)]. Figure 1 plots the Index of Government Stringency (GSI) for a panel of countries observed at four different dates between March 2020 and March 2021. The four quadrants of Fig. 1 show that, given an infection rate, countries adopted different strategies to deal with the diffusion of COVID-19; these mainly concerned national stringency on economic activities and citizen mobility. 1 Moreover, given different infection rates, countries implemented similar containment measures.
This evidence, and the escalation of controversies raised by political leaders about the most suitable policies to face the COVID-19 emergency, generated in the population a feeling of mistrust about the ability of political leaders to efficiently and effectively manage the pandemic (Abbas 2020;Capano 2020;Makridis and Rothwell 2020;Tisdell 2020), with the result that their compliance to the rules depended mainly on their economic and personal incentives. For example, in September 2020 at the beginning of the second wave of pandemic, Boris Johnson publicly stated that "It is very difficult to ask the British population, uniformly, to obey guidelines in the way that is necessary," while responding to a criticism raised by a journalist. 2 In this paper, our model aims at investigating the relationship between politicians' and citizens' incentives to deal with the pandemic. It also explains why similar countries with similar levels of infection rates implemented different containment policies and why countries with different infection rates implemented similar confinement policies. 3 The literature on the COVID-19 pandemic mainly focuses on the effectiveness of different policies aimed at controlling the diffusion of the virus, but is unable to explain the variety of policies that emerged during the time of pandemic waves. 4 Research on COVID-19 aimed at understanding the causes and predictions of the 1 The Government Response Stringency Index is a composite measure based on nine response indicators including school closures, workplace closures, and travel bans, rescaled to a value from 0 to 100 (100 = strictest response). The biweekly growth rate on any given date measures the percentage change in the number of new confirmed cases over the last 14 days relative to the number in the previous 14 days. https://ourworldindata.org/grapher/government-response-stringency-index-vs-biweekly-changein-confirmed-covid-19-cases. 2 The Guardian: https://www.theguardian.com/commentisfree/2020/sep/23/freedom-loving-brits-primeminister-state-conservative, https://www.theguardian.com/world/2020/sep/22/johnson-refuses-to-ruleout-second-covid-19-lockdown-amid-new-restrictions. 3 Dhami et al. (2020) point out that "subjective experiences of lockdown were comparable to those of firsttime prisoners" where "males in lockdown were equally often accused of disobeying the rules of lockdown as were first-time prisoners in being charged with misconduct in prison," and thus indicating the extent of psychological costs associated with a policy of strict confinement. infection trend under different policy scenarios, or the evaluation-from an empirical point of view-of its effects. For example, one branch of the COVID-19 literature assumes a benevolent social planner and, by means of an optimal control problem, studies the economic impact and the pandemic dynamic following different policies based on the classic SIR (Susceptible, Infectious, or Recovered) model or its variations (Acemoglu et al. 2021;Bischi et al. 2022;Gubar et al. 2021). This approach, however, can be criticized because it assumes the full compliance of citizens, which is not always observed in reality. Glaubitz and Fu (2020) and Reluga (2010) claimed that non-compliance is an important determinant of the pandemic evolution. Chang and Velasco (2020) slightly modified the SIR model by endogenizing the decisions of citizens to comply (or not) with lockdown policies, which in turn respond to current economic policies and expectations concerning future policies. Therefore, the infection rate pattern responds to the level of compliance (which in turn responds to economic incentives and expectations). The result is that, under the total compliance hypothesis, a perfect policy could end up being extremely ineffective, if individual economic incentives are not met.
Another branch of the literature uses a game theory approach and mitigates the hypothesis of citizens' perfect compliance, thus investigating individuals' incentives to comply (or not comply) with the Government's mitigation rules. For example, McNutt (2020) applies game theory to analyze the two strategies, i.e., obey and disobey with lockdown rules, that citizens may pursue, and investigates the issue of civic disobedience during the pandemic crisis. However, in McNutt's model, the game occurs only between citizens, while the strategies of politicians are considered/set exogenous. Similarly, in Ye et al. (2021) and Wei and Yang (2020), the dynamics of the pandemic are modeled by an underlying game that determines the behavior of citizens in the face of exogenous strategies undertaken by governments. The third branch of the literature aims at assessing the effectiveness of the variety of mitigation policies on the infection rate from an empirical point of view (Lin and Meissner 2020), or at determining which policy achieves better results (Bargain and Aminjonov 2020;Castex et al. 2020).
Our research paper is framed within the game theory literature, but diversely from previous studies, we endogenize government policy as a response to citizens' compliance and vice versa. Politicians are assumed to be utility maximizers and their utility positively depends on the consensus they have among voters (i.e., supportive citizens) and negatively on the costs of managing the pandemic. Citizens' utility instead depends on several factors. If they decide to obey the politicians' rules, their utility depends on the difference between the individual's economic evaluation of the protection they receive against the virus offered by the lockdown and the economic costs of confinement (job opportunities lost). If they decide to shirk, their utility positively depends on a "psychological effect of shirking" (which in turns positively depends on the length of the lockdown and the fraction of other disobeying citizens), and negatively on the probability of getting sick and being fined for breaking the lockdown rules.
First, we build a one-shot evolutionary game where governments decide either in favor of a soft, or a hard confinement policy as a function of obeying citizens, while citizens may choose to comply or not with the rules according to their personal and economic incentives. We then extend the model to a dynamic game between politicians and citizens and study the coevolution of their strategies over time. In the short run, we provide a rationale for why different policies implemented by countries coexist for a given level of infection rate, and why similar policies are often implemented for different infection rates.
The rest of the paper is organized as follows. Section 2 describes the setup of our one-shot population game, player payoffs, and equilibrium solutions. Section 3 develops the dynamic (evolutionary) game and studies the existence of fixed points and their stability properties. Section 3.1 discusses the results, and Sect. 4 concludes.

Setup of the game
Let us consider a country composed by a population of politicians (P) and a population of citizens (C). Politicians, who determine the actions of the Government, observe the infection rate 5 denoted by I ≥ 0 and decide whether to implement a hard (h) or soft (s) policy to limit the diffusion of the infection. 6 The lockdown policy is implemented for a given durationL ≥ 1 (the duration may be measured in days, weeks or months), and it is fulfilled (i.e., given) by both populations of politicians and citizens. Each citizen, belonging to a population C, observes the strategy played by politicians and decides whether to fulfill such a policy, that is to obey (O), or not (N O).
For mathematical tractability, we normalize the dimension of both populations of politicians and citizens to one. Let us denote by x h ∈ [0, 1] the fraction of politicians choosing a hard policy and by y O ∈ [0, 1] the fraction of obeying citizens. It follows that the fraction of politicians choosing a soft policy is equal to (1 − x h ) and the fraction of disobeying citizens is (1 − y O ).
A politician's strategy is to decide whether to undertake a soft (s) or a hard (h) policy. Each strategy has its own associated payoff, which we denote by π i , with i = s, h. The payoffs π i , i = s, h are assumed to depend negatively on the infection level and positively on the fraction of obeying citizens and on the amount of fines collected from the disobeying ones. Hence, the government's payoffs are defined by: (2) In this model, α ∈ (0, 1) is the marginal disutility given by the infection rate, irrespective of the policy chosen. Instead, the termLω i y O , i = s, h indicates the marginal increase in the politician's payoff as the fraction of obeying citizens increases in the two different regimes (h) and (s). It can be interpreted as the marginal utility of politicians as their popularity increases among voters. Politicians' payoff is also positively affected by the fines collected from non-obeying citizens; i.e., Fσ i (1 − y O ), where F > 0 is the monetary value of the fine, and σ i ∈ (0, 1) is the probability of being fined when the citizen decides not to obey either the policy (h) or (s). Notice that while politicians only gain popularity from obeying citizens, they can only collect fines from the non-obeying ones. We assume that ω h < ω s with (ω s , ω h ) > 0 and σ h > σ s , with σ i ∈ (0, 1); i = s, h. The first assumption, ω h < ω s , reflects the fact that politicians earn higher popularity when they implement softer policies. This assumption is potentially debatable, but we believe it could be realistic in most situations. Despite cases in which this hypothesis could be criticized, such as the 2020 soft policies undertaken by Boris Johnson, that caused him a dramatic loss in popularity, in the majority of countries hard policies have been harshly criticized by the public opinion (Vlandas and Klymak 2021). 7 Instead, the second assumption (σ h > σ s ) implies that the fraction of disobeying citizens fined, is higher under a hard policy, because it is easier to discover those who do not respect the rules when all economic activities are closed down.
Let us define the payoff functions of the citizens. Citizens observe the policy i = {h, s} implemented by the Government and its duration, and decide whether to obey or not. Citizens who decide to comply with the policy (i.e., obey) obtain a utility π O |i ∈ R, which is positively affected by B, i.e., citizens' evaluation of the health benefits that the lockdown offers in terms of the reduced probability of contagion. When complying with the policy, citizens also bear a cost in terms of job opportunities lost C i , which is proportional to the duration of the lockdown. We assume that B and C i are equal for all individuals, and that people act as rational and selfish individuals. 8 Both the strictness of the policy and its duration are known to the citizens. Thus, the payoff functions of obeying citizens, under hard and soft lockdown regimes, are, respectively: where μ ∈ (0, 1) and C h ≥ C s reflect the fact that both protection against the virus and the economic costs faced by individuals are higher under hard lockdown. Non-obeying citizens, on the other hand, receive a payoff function π N O ∈ R, defined by: where P i (L, y O ) is a psychological benefit (i.e., the "psychological effect of shirking") associated with the type of policy i = {h, s}, which depends positively on the duration of the policy, and negatively on the number of obeying citizens. The latter are aimed at catching the "tiredness effect" and the "peer effect" that are well-known in the medical literature concerning the consequences of lockdown on individual mental health. 9 Moreover, non-obeying citizens incur the risk of becoming infected with probability θ ∈ (0, 1] and bearing the consequent cost (in terms of health lost) equal to D ≥ 0. 10 Non-compliance with the rules exposes the non-obeying individual to the possibility of being fined. They will incur fines or infringements F > 0 as a consequence of their non-compliance with policy rules, with probability σ i ∈ [0, 1]. When the policy is soft and economic activities are running (even with some limitations), it is more difficult to discern whether individuals are non-obeying or if they are simply doing what they are allowed to under the softer policy. Therefore, in what follows, we will assume that for a public official it is more difficult to catch a non-obeying citizen when the policy is soft, with the consequence that σ h > σ s , and (σ h , σ s ) ∈ (0, 1). 8 These assumptions are introduced for mathematical tractability. There are indeed several indicators suggesting that both B and C i might not be the same across individuals. For instance, it was found that Black Brazilians were twice as likely as white Brazilians to have had COVID-19 symptoms. At the same time, they were likelier to suffer economic consequences of pandemic in the form of job losses or pay cuts (The New York Times, https://www.nytimes.com/article/brazil-coronavirus-cases.html). Also, the King's College of London's survey reveals that some of the reasons for non-compliance are "going out to work" and caring for others' need (the King's College London survey in Bloomberg Opinion, https://www.bloomberg.com/ opinion/articles/2020-09-30/covid-19-disobedience-goes-deeper-than-we-think, MoveHub https://www. movehub.com/blog/best-and-worst-covid-responses/). However, citizens seem to obey when economic losses due to quarantines are compensated like in Austria and Iceland (the King's College London survey in Bloomberg Opinion, https://www.bloomberg.com/opinion/articles/2020-09-30/covid-19-disobediencegoes-deeper-than-we-think). 9 It is indeed confirmed by the medical literature that a reduction of social contacts is associated to an increase of mental health impairments (Christoph et al. 2020;Murphy et al. 2020). Moreover, Tunçgenç et al. (2021:13) found evidence of peer effect in the compliance to the rules by individuals. 10 D = 0 means that those who become infected are asymptomatic, therefore their loss of health is null.
Note that we are assuming that citizens who do not obey with the policy do not face any economic cost. Therefore, for these citizens, C i = 0, which implies that they do not receive any protection from the virus B. We define the psychological benefit under hard and soft lockdown as follows, i.e., P h (L, y O ) =L +γ (1− y O ) and P s (L, y O ) = βL + γ (1−y O ), where β ∈ (0, 1) means that the tiredness effect is lower when the lockdown is soft, because it decreases in proportion to the disobeying citizen's payoff during soft lockdown. Moreover, the tiredness effect is greater in proportion to the duration of the lockdown (i.e., the greater the effect of the psychological shirking benefit of the non-obeying citizen, the longer the duration of the policyL ≥ 1). Parameter γ ≥ 0 measures peer effects. Indeed, disobedience toward the policy-while all other citizens are obeying-does not produce any increase in the psychological benefit of shirking. That is to say, the psychological benefit from non-obeying the policies increases as more and more citizens do not obey such policies. Therefore, the payoff of non-obeying citizens under a hard and a soft policy is: The equation for the psychological shirking benefit is supported by the literature. In the behavioral literature, Tunçgenç et al. (2021:13) find that "the best predictor of people's adherence to distancing was perceived adherence of their close circle, which exceeded the effect of people's own approval of the rules," so as to corroborate our hypothesis. Tuncgenc et al. (2021) conducted a global study through a survey held from April to June 2020, involving 6500 participants. They asked people how much they, their friends, and their fellow citizens approved of and observed the COVID-19 guidelines in their country. This is crucial, as people are more likely to accept new social rules when they have high expectations that others are following those rules as well. Their findings do not support the individualist assumptions of many governments: People who followed guidelines the most were not those who found the rules more justified, or those who were more vulnerable to the disease. The most diligent rule followers were, consistently, those whose friends and families were following the rules. Further, people who were more vulnerable to COVID-19 were more likely to follow the guidelines if they had a wide social circle. These findings applied across age groups, genders, countries, and were independent of the strength of local restrictions. Indeed, in the medical literature some authors found a positive association between perceived changes in everyday life and a more significant decrease in social contacts, with higher mental health impairments (Christoph et al. 2020). Moreover, Murphy et al. (2020) found that "people become less compliant, the longer they had experienced lockdown." Additionally, a common sentiment that circulates among entrepreneurs, after long lockdown periods, was that in spite of all the restrictions imposed by the government, it might be better to ignore them and risking getting fined, than closing down all activities.

Threshold levels and optimal strategies
Let us compute the expected payoffs. With some easy calculations, the expected payoff of a politician under hard lockdown is given by Eq. 8, and the soft lockdown expected payoff is given by Eq. 9. That is, for politicians, the expected payoffs from hard and soft policies are: which are weighted averages of the payoffs under the two policies, and the weights are the number of obeying and disobeying individuals. Comparing the expected payoffs of hard and soft policies, politicians will choose a hard lockdown if E P h > E P s ; otherwise, they will choose a soft one. Therefore, their choice depends on the fraction of obeying citizens y O , because politicians will choose this strategy h if and only if the fraction of obeying citizens is lower than a given threshold value, i.e., Recall that σ h > σ s and ω s > ω h , so 0 ≤ y * O ≤ 1. Hence, politicians' decision to choose a hard or a soft policy is based on the ratio between the additional fines collected under a hard policy regime (the numerator of equation 10) and the trade-off between the excess fines collected and the popularity lost by politicians (the denominator of equation 10). Equations 1 and 2 show that politicians only obtain additional popularity from obeying citizens with a soft policyL(ω s − ω h ). Further, additional fines under a hard policy, compared to a soft policy, F(σ h − σ s ), are only collected from nonobeying citizens. If the fraction of obeying citizens is above the threshold y * O , additional popularity of a soft policy prevails on politicians' utility over additional fines collected under a hard policy. The opposite holds if the fraction of obeying citizens is below y * O . Therefore, when the fraction of obeying citizens is lower than y * O , politicians will implement a hard policy; otherwise, they will implement a soft policy.
For citizens, the expected payoffs of the two strategies, i.e., obey and non-obey, are: Citizens will choose the obey strategy (O), instead of non-obey strategy (N O), if and only if E C O ≥ E C N O . This is the case when the fraction of politicians choosing hard x h is below or beyond a given threshold x * h , where In what follows, we will consider and make use of the following definitions: meaning that K is the excess utility of non-obeying over obeying under a soft policy, when the peer effect is assumed to be zero. It can be rewritten as The term W , on the other hand, represents the same measure under strict policy and can be rewritten as The term D F is the excess of fines collected under a hard policy as opposed to a soft lockdown, and DL is the excess popularity enjoyed by adopting a soft lockdown policy versus a hard policy. (From another perspective, it can be read as the popularity lost by applying a hard policy relative to the popularity that could have been achieved by applying a soft lockdown policy.) The threshold level of obeying citizens y * O represented by Eq. 10, in which politicians find it optimal to adopt a hard policy, and by using Eqs. 16-17, can be rewritten as: Similarly, by means of Eqs. 14-15, the threshold 13 may be rewritten as: The term x * h can be interpreted as the fraction between the excess payoff of obeying with respect to non-obeying under a soft policy (the numerator) and the tradeoff between non-obeying and obeying under a hard and a soft policy (the denominator). Therefore, while the numerator is affected by the peer effect, the denominator is not. Given that W and K can be either positive or negative, and seeing that x * h must necessarily be positive and fall between 0 and 1, two cases follow. However, depending on the sign of the denominator in equation 19, different scenarios may arise.
1. Case 1. Both the numerator and denominator of 19 are negative, which implies that citizens choose to obey if the fraction of politicians choosing a hard policy is greater than the threshold, x h > x * h , i.e., the excess utility of disobeying with respect to obeying under a hard lockdown is lower than in a soft one. This case divides into four subcases: • Case 1a: W < 0 and K > 0. The trade-off between obey and disobey under a hard policy is positive, while it is negative under a soft policy. When politicians choose hard, citizens' dominant strategy is obey, while if politicians choose soft citizens' dominant strategy is disobey. This is a case in which the reduced probability of being fined under a soft policy makes it easier for people to break the rules in order to avoid the policy cost. • Case 1b: W < 0, |W | > |K |, K < 0 and −K < γ (1 − y O ). The trade-off between obey and disobey is positive both in the presence of hard and soft policies, but the excess utility of obey with respect to disobey in soft is smaller than the peer effect. Therefore, in the case of a soft policy, if the number of obeying individuals is "large enough," the peer effect may prevail over K and disobey is the citizens' dominant strategy. • Case 1c: W > 0, K > 0 |K | > |W |. The citizens' dominant strategy is disobey both under hard and soft policy regimes. This result holds irrespectively of the peer effect value. • Case 1d: −W = γ and K = 0. The trade-off between disobey and obey is zero at point (0,1), i.e., soft and obey. At point (1,0), i.e., hard lockdown and disobey, the excess utility of disobey with respect to obey (net of the peer effect) is exactly equal to the peer effect. In other words, this is case in which citizens are indifferent toward the two strategies.
2. Case 2. Both the numerator and denominator of 19 are positive, which implies that citizens choose to obey if the share of politicians choosing hard is lower than the threshold, Here, the excess utility of obey with respect to disobey under a soft policy is positive and greater than the peer effect. Therefore, the citizens' dominant strategy in the case of a soft policy is to obey, irrespectively of the number of disobeying citizens. Instead W can be positive or negative. Case 2 can be divided into two subcases: in the presence of a hard policy, while obey is the dominant strategy under a soft policy; this is true since K prevails, in absolute value, over the peer effect. • Case 2b: It follows that, depending on the fraction of obeying citizens, multiple equilibria may arise. We will extend this static game to a dynamic one in the next Section. Further, we will study the replicator dynamics, the existence of its fixed points, its stability properties and the dynamics surrounding/accompanying them.

The evolutionary game
Let us consider politicians' expected payoffs from the strategies hard and soft as a function of the share of obeying citizens, as in Eqs. 8 and 9. It follows that the average payoff for politicians can be calculated as a weighted average of the expected payoffs from the two strategies. That is, the average payoff of politicians is: Similarly, given the citizens' expected payoffs for strategies obey and disobey, as in Eqs. 11 and 12, we can compute the citizens' average payoff, i.e., The Replicator Dynamic (RD) in continuous time (Sandholm, 2010;Weibull, 1995) of the strategies hard and obey is therefore represented by the following system of two differential equations, 11 or, if we substitute Eqs. 8, 9, 11 and 12 into 22 and rearrange them as follows, The RD system indicates that the number of agents adopting a given strategy increases if the expected payoff of that strategy is greater than the average payoff attained by the population they belong to, otherwise it decreases. This RD system (23) is a nonlinear two-dimensional dynamical system in continuous time. The first step to understanding its qualitative dynamic behavior is to investigate the existence of fixed points in the action space of the players (i.e., the Cartesian Cartesian plane), each one with its own stability properties. "Appendix" herein contains the computation of all the seven fixed points and an analysis of their stability properties. As shown in "Appendix," the fixed points that exist in this system (23) are the four corner solutions, that is (0, 0), (1, 0), (0, 1) and (1, 1), i.e., two border fixed points, 12 which we define as E b1 and E b2 and are equal to: and an interior solution (x * h , y * O ). From the analysis of the stability properties of the seven different equilibria, it is possible to prove that, under certain configurations of the parameters, only two equilibrium points can be considered asymptotically stable, at least "in most cases." 13 These two stable equilibria that arise from the model are (0, 1); that is, all politicians choose a soft policy and all citizens  (1, 0); namely, all politicians choose a hard policy and all citizens disobey. From a policy perspective, it is also interesting to study the global dynamics involved in the interior equilibrium, which may provide insight into the rationale of politicians who want to induce the system to converge toward one of the preferred stable equilibria (provided there are multiple stable fixed points). The first stable equilibrium (0, 1) is the one characterized by the soft policy chosen by policymakers, with all the citizens obeying. As is discussed in the "Appendix," a sufficient condition for stability of this equilibrium is that K < 0, i.e., that the dominant strategy for citizens under soft is to obey. Citizens' dominant strategy under a hard policy can be either obey or disobey, depending on the value of W , that can either be positive or negative.
The second stable equilibrium is (1, 0) when all politicians choose to adopt a hard policy and all citizens disobey. This equilibrium is characterized by the fact that −W < γ ; in other words, the excess utility of disobey with regard to obey (W ), net of the peer effect, can be positive or, alternatively, negative. However, it will be smaller than the peer effect.
The dynamics leading to those different equilibria, however, may be different depending on the parameters of the problem case (1 or 2) in which we find ourselves; subsequently, the subcase we are in is also determined.
In this section, a graph illustrates the diverse dynamics that arise in the six different scenarios; we accompany this with a discussion thereof. To simulate the dynamics of the model in the six different cases, we selected a set of feasible parameters for each case and plotted their corresponding dynamics. The parameters used to simulate the different dynamics are those listed in Table 1.
Let us analyze the dynamics case by case: • Case 1a: As we mentioned in Sect. 2.1, this case is characterized by the fact that citizens' dominant strategy under a soft policy is disobey, as suggested by K > 0, while the dominant strategy under a hard policy (net of the peer effect) is obey. The peer effect, however, is fairly large (γ = 6.5) and may offset W , when the number of disobeying citizens is large enough. 14 It follows that when the number of non-obeying citizens increases, the dominant strategy under a hard policy may become disobey. This leads to the dynamic shown in Fig. 2: The dynamics are represented by a growing spiral starting from the internal equilibrium, which converges to the only stable fixed point represented by (1, 0). Indeed, if we assume that obey and hard are frequent strategies (so that we are at a point north-east of the interior equilibrium), politicians will try to maximize their utility by choosing a soft policy, because this allows them to gain utility from increased popularity. Diversely, if politicians choose a soft policy, citizens' dominant strategy is disobey, so y O decreases. At this point, since the number of obeying citizens decreases, fines become more important for politicians; thus, they will try to maximize their utility by choosing a hard policy. However, at this point the dominant strategy for citizens becomes obey, and as long as the share of disobeying citizens is not very large, this remains the dominant strategy. As a consequence, the system continues rotating around the internal equilibrium, until it reaches the stable state (1, 0). Indeed, given this set of parameters, the eigenvalues associated with the equilibrium (0, 1) are one positive and one negative. However, those associated with the (1, 0) equilibrium are all negative, so that the system converges toward this equilibrium point. (More details can be found in "Appendix.") • Case 1b: W and K are both negative. Therefore, obey, net of the peer effect, is a dominant strategy under both hard and soft lockdowns. However, the peer effect is quantitatively large and may offset the excess utility of obey compared with disobey, even when the share of disobedient citizens is low. The size of the peer effect is particularly visible when the policy implemented is a soft lockdown. When the lockdown is hard, the share of disobeying citizens must be quantitatively significant so that the peer effect offsets W . This case is shown in Fig. 3. Assuming that obey and hard policies are frequent strategies, such as at a point north-east of the interior equilibrium, politicians will try to maximize their popularity, given that the share of obeying individuals is larger than y O . However, since K < 0, citizens' choice of obey becomes easier to make, and as long as the number of obeying citizens is large enough, i.e., −K > γ (1 − y O ), the dominant strategy for citizens remains obey and the system converges toward (0, 1). In other words, all the politicians choose a soft policy and all the citizens obey. Diversely, when hard and obey are rare strategies, such as at a point south-west of the interior equilibrium, politicians will try to maximize their utility by imposing fines. Therefore, they will choose a hard policy since the number of obeying individuals is small. But for citizens, the dominant strategy under hard is to disobey, since the number of disobeying citizens is large and the (absolute value of the) peer effect γ (1 − y O ) offsets W . This leads the system to converge toward (1,0) where all politicians choose a hard policy and all citizens disobey. The set of parameters W and K both negative leads to the local stability of both equilibria (1,0), that is, hard and disobey, and (0,1), that is, soft and obey. Further, the eigenvalues of the first (1,0) equilibrium are both negative (λ 1 = −DF and λ 2 = −W − γ ), and so are the eigenvalues of the second (0,1) equilibrium (λ 1 = −DL and λ 2 = K ) (see "Appendix"). • Case 1c: W and K are both positive. Therefore, the dominant strategy under both soft and hard lockdowns is disobey, irrespective of the peer effect value. Furthermore, W is greater than K , i.e., the excess utility of disobey with respect to obey is greater under hard than soft. This case basically represents a situation in which it is never easier for citizens to respect policy rules; this is because the economic costs avoided and the psychological benefits of disobeying are greater than the potential benefits of not getting infected by complying with the confinement rules. Costs would include the risk of health loss due to the chance of contracting the virus, and the potential costs of fines. This scenario leads to a dynamic shown in Fig. 4. In this case, the equilibrium represented by soft and obey is locally unstable. Its associated eigenvalues are λ 1 = −DL and λ 2 = K , which are negative and positive, respectively (see "Appendix"). Therefore, equilibrium (0, 1) is a saddle point. The equilibrium represented by hard and disobey, instead, has eigenvalues equal to λ 1 = −DF and λ 2 = −W − γ , which are both negative. Furthermore, whatever the initial conditions are, the system will converge, sooner or later, to the equilibrium (1, 0), i.e., to hard and disobey. • A special case 1d: edge-to-corner bifurcation. It is worth noting that the equilibria (0, 1) and (1, 0) change their stability properties as a consequence of a bifurcation. When E b 1 and E b 2 are included in the player action space, then the equilibria (0, 1) and (1, 0) are stable. For specific sets of parameters when K = 0 and −W = γ , it may occur that (0, 1) = E b 1 and (1, 0) = E b 2 (the boundary equilibrium disappears through the transcritical bifurcation, despite the fact that these conditions are not necessarily simultaneously met), and equilibrium (0, 1) inherits the stability properties of E b 1 and equilibrium (1, 0) those of E b 2 . When both bifurcations simultaneously occur, i.e., when (0, 1) = E b 1 and (1, 0) = E b 2 , none of the equilibria are locally stable. (All the equilibria have at least one nonnegative eigenvalue at their associated Jacobian matrix.) Therefore, this special case is characterized by a perpetual dynamic characterized by a diverging spiral starting from the internal equilibrium. From an economic point of view, as we suggested in Sect. 2.1, in this case the excess utility of obey with respect to disobey (net of the peer effect) is zero, and a situation in which the excess utility of obey (net of the peer effect) under hard is equal to the peer effect. Herein, citizens diverge from the equilibrium (0, 1), i.e., soft and obey, because they are indifferent toward the two strategies. Since disobeying citizens do not exist, K = 0 means that no strategy is dominant at the (0, 1) point. This set of parameters also makes the other equilibrium (1, 0), i.e., hard and disobey, unstable (Fig. 5). This is true given that-at this point, since all citizens are disobeying the rules-no strategy is dominant. Indeed, the excess utility of obey with regard to disobey (−W ) is exactly equal to the peer effect. (The equality −W = γ is verified.) Thus, once again, citizens are indifferent toward the two strategies. • Case 2a: In this case, as mentioned in Sect. 2.1, W − K is positive and this leads to two real eigenvalues associated with the Jacobian computed at the interior equilibrium point. W is positive, and therefore citizens' dominant strategy under hard is disobey, irrespective of the peer effect, while the dominant strategy under a soft policy, net of the peer effect, is obey. (Indeed K is negative.) The peer effect is small compared to the value of K , therefore it never offsets the excess utility of obey with respect to disobey. Further, it implies that under a soft policy the dominant strategy for citizens is obey. This situation gives rise to the dynamics shown in Fig. 6. Whatever the initial conditions are, the system will rapidly converge to either (0, 1) or (1, 0). The basin of attraction of the equilibrium (0, 1) is quite large, consisting in more than half of the surface of the action space representing the set of feasible actions. • Case 2b: In this case, as pointed out in Sect. 2.1, citizens' dominant strategy (net of the peer effect) under both hard and soft lockdowns is obey; however, the peer effect may offset both K and W if the number of disobeying citizens becomes large enough. Therefore, assuming that hard and obey are frequent strategies, politicians will try to maximize their utility by seeking greater popularity from the obeying individuals; thus they will choose soft. Two cases exist. First of all, if the number of obeying individuals is large enough and the peer effect does not offset the excess utility of obey with respect to disobey under soft, the system will converge toward (0, 1). Secondly, instead-if the number of obeying individuals is small-then the peer effect will offset K so that citizens will start disobeying. As long as the number of disobeying citizens increases, politicians will try to increase their payoff by imposing fines and will therefore choose a hard policy. But when the number of disobeying citizens is large, the peer effect offsets W in absolute value, and citizens' dominant strategy will become disobey. The system will end up converging toward (1, 0), i.e., to a situation in which all the politicians choose a hard policy and all the citizens disobey. This dynamic is represented in Fig. 7.

Discussion
Results from the analysis of the six scenarios may provide thought-provoking/attractive hints for policy makers. The different dynamics governing the convergence to one of the equilibria, depending on which of the previous cases we consider, can somehow be interpreted as a "shortterm" response to the actions undertaken either by the politicians' and citizens' populations. The "long-run" response is, of course, represented by the stable fixed point to which the system (i.e., the populations) converges.
Only incentives can determine short-term responses and, therefore, the system dynamics in disequilibrium, as well as the convergence to one of the possible stable fixed points.
The short-term dynamics that arises from our model sheds light on the reasons for which, during the COVID-19 pandemic that occurred between 2020 and 2021, we observed countries facing similar levels of infection but responding very differently in terms of containment levels. The analysis performed identified two broad scenarios that are characterized by very different dynamics. The first, indicated by case 1 in Sect. 2.1, which generates oscillations around the internal equilibrium, gives rise to a diverging spiral. The second, indicated by case 2, which generates a uniform path toward one of the two stable fixed points, depending on the initial conditions. The number of fixed points and their stability properties may vary, depending on the scenario.
The second case (which includes scenarios 2a and 2b) generates two stable fixed points, i.e., (0, 1) and (1, 0), which indicate soft and obey, and hard and disobey strategies, respectively. Here, a change in the values of the feasible parameter set does not change the stability properties of the two points, but has only the effect of changing their basin of attraction.
In the first case (scenarios 1a-1d), the number of stable fixed points may vary between two, one, and zero. Two stable fixed points arise when the dominant strategy for citizens (net of the peer effect) under both hard and soft lockdowns is obey, but the peer effect is large. This leads to dynamics represented by a diverging spiral from the internal equilibrium that ends in one of the two stable points, exactly which one depends on the initial conditions. One fixed point arises when i) the dominant strategy for citizens in both regimes is disobey; this is irrespectively of the value of the peer effect (case 1c), or ii) when citizens' dominant strategy under soft lockdown is disobey, while the dominant strategy under hard lockdown (net of the peer effect) is obey, but the peer effect is large (case 1a). In these subcases, the diverging spiral ends in the unique stable fixed point (1, 0), that is, hard and disobey.
In some circumstances, the system may also have zero fixed points. From a mathematical point of view, when citizens are indifferent between obeying and disobeying at the two points (1, 0) and (0, 1), the two boundary equilibria E b1 and E b2 collapse with the two corner equilibria (0, 1) and (1, 0). Moreover, the latter inherits the stability properties of the former (i.e., we have an edge-to-corner bifurcation). This implies that the system perpetually rotates around the internal equilibrium (Fig. 5).
As stated in Introduction, our initial research question was to explain the cross-country differences observed in the governments' management of the pandemic, given similar infection levels. However, our model may also provide a hint on what could happen within a single country over time. For example, the "diverging spiral" that describes the system dynamics as shown in case 1 above may justify the containment policy implemented by the Italian Government from September 2020 through April 2021. Indeed, the Government reclassified regions and municipalities according to the intensity of the spread of the infection and assigned them different colors accordingly. This classification was periodically revised (weekly or every two weeks, and even more often under some circumstances, e.g., the Christmas period, when harder policies were implemented to counter-balance the higher incentives to disobey). Each color represented a different degree of strictness in the lockdown imposed (red = hard lockdown; orange = medium lockdown; and yellow = soft lockdown). Over the two-year Pandemic period, all Italian regions and municipalities alternated short periods of hard policies with other periods of soft policies, and their dynamics resembled those shown in Figs. 2, 3, or even 5. Fortunately, with the mass public vaccination policy that began in late February 2021, the dynamic situations changed dramatically in favor of a more stable dynamic characterized by obeying citizens and relatively soft policies.
Our model predicts that incentives determine whether we can find two, one or zero stable fixed points and, in the case of two steady states, their basin of attraction. When citizens' incentives are such that the dominant strategy is always disobey (or, in alternative, when the peer effect is large enough to offset the excess utility of obey compared to disobey under a hard policy), the system allows a unique stable state hard and disobey. If we assume that the two stable fixed points exist, i.e., hard and disobey, and soft and obey, and that the first equilibrium is not socially desirable, the question that a politician may ask himself is how to coax the system to end in an equilibrium that is more desirable from a social point of view, i.e., in which all citizens obey and politicians choose a soft lockdown policy. On the other hand-from our point of view-the question is "How is this equilibrium more likely to emerge"? In other words, how is it possible to increase its basin of attraction or implement policies that modify the parameter values of citizens' payoff functions, to increase the chance that a desirable outcome emerges? A general answer is to increase the payoff of obeying citizens while reducing the payoff of the non-obeying ones. Several ways could be pursued to reach this goal. One would be to implement policies aimed at increasing protection against the virus for those who obey. Alternatively it could be possible to compensate the economic costs of obeying for those who do obey. The latter would be to avoid cost considerations from prevailing over the individual decision to not obey. Indeed, from an empirical point of view, it is true that higher levels of compliance with lockdown rules were attained in countries where the population is richer or the government mitigated the economic costs of isolation, such as in Austria and Iceland (Wright et al. 2020). Moreover, a government should approve and enact policies aimed at assuring the transiency of the losses that lockdown policies may generate. The fear of a permanent loss of one's job (and consequently a permanent loss of future income) may induce citizens to disobey. Therefore, policies aimed at sustaining firms, who do not fire their employees, may help to reduce C i . However, ambiguous effects on the chance of reaching the desirable outcome (0, 1) may be due to how governments modulate fines and the length of the lockdown policy. Indeed, if increasing the expected fine is a way of decreasing the payoff of the non-obeying citizenseither by increasing the fine value F or the probability of catching non-obeying citizens (i.e., by increasing controls of compliance with the rules), it is also true that an increase in the expected fine induces politicians to choose a hard lockdown, instead of a soft one. A similar result can be obtained for the duration of a policy. If, on one hand, the lengthL increases the psychological benefit of disobeying and increases the economic cost of obeying, on the other, it increases the probability that a politician might choose a hard lockdown policy. Therefore, these effects may compensate one another. An issue (optimal fines versus optimal length of the lockdown) that remains for future research.

Concluding remarks
In this paper, we developed a noncooperative evolutionary game with two populations of politicians and citizens which can rationally explain the evidence that similar countries with similar levels of infection implemented very different containment policies. This model can also justify why countries with different levels of infection adopted very similar containment policies, and-moreover-why we may observe a rapid alternation of hard and soft measures within a single country and a short period of time. Our model is interesting from a dynamics point of view because it offers a rationale for a variety of short-term policies which characterized the months in which the COVID-19 pandemic was at its relative peak.
Incentives faced by the populations of citizens are important for determining the dynamics of the system. We identify six different scenarios according to citizens incentives, each one leading to a different dynamic in the existent stable steady states. If politicians maximize their payoffs by searching for popularity or by maximizing revenues through fines, the economic incentives of citizens and the peer effect play a crucial role in determining the number of fixed points, their stability properties, and the dynamics converging to them. In particular, when citizens' excess utility of disobey compared to obey is lower following politicians' choice of a hard confinement policy over a soft one, an oscillatory dynamic arises that is characterized by rapid changes in the strictness of the confinement policy and on the level of citizens' compliance. Diversely, the system converges uniformly toward the attractive fixed point, depending on the initial conditions.
Economies providing different individual incentives may therefore generate different dynamics which-in the short run-may rationally explain the evidence cited above which also motivated this research paper.
In the "long term," our analysis shows that, depending on the parameter set, the two strategies of citizens and politicians converge toward either hard policy and disobey or soft policy and obey (in the case in which two steady states coexist), or to hard policy and disobey (in case of the existence of only one steady state). Under certain circumstances, the system may be globally unstable and continue rotating around the interior equilibrium. Convergence toward the stable fixed point or the perpetual oscillation around the interior equilibrium can only be avoided by an "exogenous shock" such as a massive vaccination campaign, for example.
We are aware that when setting up the model, we made some assumptions that might be debatable for some readers. One of those is certainly the hypothesis that a soft lockdown is a more popular policy than a hard lockdown, which in our model implies ω s > ω h . Indeed, in some countries the contrary could be true. During his presidency, Donald Trump continued to reject the possibility of implementing a lockdown policy in the USA, claiming that the cure for the coronavirus-meaning the economic costs of the lockdown (which he claimed would be around 15 billion dollars per day)-would have been worse for the country than the cost of lives lost. As is noted, several months later Trump lost his reelection bid; thus, a lockdown-related decision undoubtedly played a significant role in the 2020 presidential elections in the USA. Similarly, in the UK, for example, if-at the beginning of the pandemic the P.M. Boris Johnson ruled out any hard policy on lockdown-following the collapse of his popularity due to the increasing number of deaths, he rapidly changed his mind and put Britain under a hard curfew. He announced that this would last while the UK waited for the availability of vaccines. A relaxation of this hypothesis may possibly generate new equilibria or different stability properties of the equilibria generated by our model, which may explain even better the differences existing between countries concerning policies on the rate of infection. We believe that this work can represent a valuable starting point for future investigation.
On a final but not less important note, we must point out that we did not study the evolution of the infection rate, which is assumed herein to be given and constant over time. This variable was set for mathematical tractability and to avoid diverting attention from the rationale governing the variegated policies observed at a global level. Studying the evolution of the infection rate under this setup could imply incorporation of the evolutionary selection mechanism of citizens' and politicians' choices into an SIR model; however, for the reasons outlined above, such a move would go beyond the scope of this manuscript. Nevertheless, it could undoubtedly represent a possible topic for future research.
The other non-corner equilibria can be easily computed with few manipulations of the equations in the system. Let us rewrite the system 23, withẋ h = 0 andẏ O = 0: If we allow x h to be equal to zero, it is clear that the first equation of the system is satisfied for any feasible value of y O , but not the second. If we substitute x h = 0 in the second equation of the system, we get that: So, another solution (a boundary solution), that we have defined as E b 1 is equal to Similarly, if we assume that x h = 1, it is clear that, once again, the first equation of the system is satisfied for any feasible value of y O . The second equation instead becomes with solution equal to Therefore, the second boundary solution to the system, which we have defined E b2 , is equal to The last (internal) equilibrium can be easily computed by dividing both right-and left-hand sides of the first and the second equations, respectively, for x h (1 − x h ) and y O (1 − y O ). After some rearrangements, we can express the first equation as: which, substituted into the second, after some manipulations yields the already known result for x h : Therefore, another equilibria for the system are Summarizing, we can claim that the system admits the existence of at most seven fixed points. Now, we assess whether the seven equilibria are asymptotically stable (unstable) points by analyzing the Jacobian Matrix J (ẋ h ,ẏ O ) computed at the equilibrium points. The local stability of the each equilibrium point is determined through the usual linearization procedure, i.e., according to the study of the eigenvalues of the Jacobian matrix. The idea is to use the eigenvalues of the Jacobian matrix evaluated at a critical point to understand the behavior of the system near that critical point (Bischi et al., 2015). For the RD system (23), its Jacobian matrix is given by: The stability analysis at the corner equilibrium, as well as at the boundary equilibria, is straightforward, as the Jacobian matrix is diagonal at any corner equilibrium and triangular at any boundary equilibrium, and hence, in both cases the eigenvalues are given by the diagonal entries.
Let us analyze the equilibria one by one, rearranging according the notation introduced in Eqs. 14-17. That is: • At the equilibrium point (1, 1), In this situation, we can easily prove that the eigenvalues of J (1, 1) are equal to λ 1 = DL and λ 2 = W . By definition, DL is positive, so is λ 1 . Depending on the choice of the parameters, λ 2 could be either positive or negative. This means that at least one (and possibly two) eigenvalues are positive, which make this fixed point asymptotically unstable.
• At the equilibrium point (0, 0), we have: In this case, it is trivial to compute the eigenvalues, since they are equal to λ 1 = DF and λ 2 = −K − γ . As said before, λ 1 is always positive by definition, so, whatever the sign of K , this point has always at least one eigenvalue positive, which make this equilibrium asymptotically unstable.
• At the equilibrium point (0, 1), we have: The eigenvalues here are trivial to compute. The first eigenvalues λ 1 are equal to λ 1 = −DL, which is negative by definition, while λ 2 = K , which might be positive or negative.
If K < 0, that is to say, if the excess utility of non-obey with respect to obey under soft lockdown, net of the peer effect, is negative, then the equilibrium is asymptotically stable. • At the equilibrium point (1, 0), we have: Once again, it is trivial to show that λ 1 = −DF and λ 2 = −W − γ . λ 1 is negative by definitions, since the conditions on the parameters were such that σ h > σ s . Moreover, depending on the sign and size of W , if the excess utility of non-obey with respect to obey under hard lockdown (net of the peer effect) is greater than the peer effect, then also λ 2 is negative, and this condition guarantees that this equilibrium is asymptotically stable. • At the border equilibrium E b1 = (0,L (β+C S )−μB−Dθ −Fσ s +γ γ ) = (0, K +γ γ ), the Jacobian is: Exploiting Eqs. 14-17 and manipulating a bit, we obtain that: Since the matrix is triangular, it is easy to prove that λ 1 = J 11 and λ 2 = J 22 . The analysis of the sign of J 11 and J 22 is therefore crucial to determine the sign of the eigenvalues and therefore the stability of the border equilibrium E b1 . It is easy to see that J 11 is negative if two conditions hold, alternatively: a) K > 0 or b) K < 0 and γ > K (DL+DF)

DL
, and it is positive if K < 0 and 0 < γ < K (DL+DF)

DL
. At this border equilibrium E b1 , it is easy to see that J 22 is negative when K > 0 or when K < 0 and 0 < γ < −K (i.e., the peer effect is "small enough") and positive otherwise. Moreover, it can be easily noticed that at this corner equilibrium J 22 can be rewritten as: which is always negative when K > 0 and positive otherwise. Recall that y * O is always positive since it is included between 0 and 1. It follows that if K > 0, the equilibrium is stable, but it lies outside the strategy space of the players and then it does not exist as a feasible outcome. If, instead, K < 0, we can claim the presence of a saddle point only if the condition holds, while in the other case we can state that E b1 is an unstable point.
, the Jacobian matrix is: where, making use of Eqs. 14-17, then: The characteristic polynomial of J (E b2 ) is λ 2 − λ(J 11 + J 22 ) + J 11 J 22 = 0 which leads to the following roots (eigenvalues): λ 1 = J 22 and λ 1 = J 11 . The sign of J 11 and J 22 is therefore crucial to determine the stability of this border equilibrium E b2 . It is easy to see that J 11 is positive when W is positive, while if W is negative, J 11 is negative when γ < − W (DL+DF)

DL
, that is to say, when the peer effect is "small enough." Moreover, if W is positive, it implies that the border equilibrium lies outside the strategy space of the players and therefore we must impose W negative as a constraint. Similarly to what has been done with the previous border equilibrium, with little manipulations we can rewrite J 22 as: J 22 = −W · y * o whose sign depends exclusively on W . Therefore, J 22 is negative when W is positive and vice versa. So, in order to be stable, the border equilibrium E b2 must have two negative eigenvalues associated with his Jacobian. However, the values of W that guarantee the simultaneous negative sign of J 11 and J 22 cannot coexist, because J 11 is negative for negative values of W and sufficiently small peer effect, while J 22 is negative when W is positive. It follows that the border equilibrium E b2 might be a saddle point, when the peer effect is small; otherwise, it is globally unstable.
• The interior equilibrium (x * h , y * O ) is defined as , and at this equilibrium point, the corresponding Jacobian matrix is defined as: The eigenvalues of this matrix can be computed by solving for λ the characteristic polynomial given by the following expression: Now, it easy to realize that J 22 is always positive for any feasible value of the parameters. J 21 instead is positive when (W − k) is positive, and negative otherwise. In order, this equilibrium exists, W and K must be both negative or both positive. (If W and K have opposite sign, the equilibrium is not included in the feasible strategy space, and moreover, |W | > |K |.) The fact that J 22 was positive implies that the eigenvalues of this Jacobian matrix cannot be contemporaneously negative, conditions under which stability is guaranteed. If indeed J 2 22 + 4J 21 J 12 is positive, we have a saddle point, while if J 2 22 + 4J 21 J 12 is negative, we will have two complex eigenvalues with positive real part. Let us develop the conditions for that to happen. With some manipulations, the expression J 2 22 + 4J 21 J 12 can be rewritten as a function of γ as follows: J 2 22 + 4J 21 J 12 = γ 2 · y * 2 (1 − y * ) 2 − 4(y * (1 − y * ) 3 (W − K )Z ) + −γ 4y * (1 − y * ) 2 (K + W )(W − K )Z + −4y * (1 − y * )(W − K )Z K W where It is easy to see that the coefficient attached to γ 2 might be positive or negative depending on the difference between the excess of utility of non-obey with respect to obey in hard and in soft lockdown. If this difference, represented by W − K , is negative, it is easy to show that the coefficient attached to γ 2 is positive, and this implies that if the peer effect is small enough, there exist two real eigenvalues for the Jacobian associated with this equilibrium, one positive and one negative, making the equilibrium an unstable saddle node. Solving Eq. 24 for γ , indeed the solution of the inequality J 2 22 + 4J 21 J 12 > 0, which guarantees real eigenvalues, is internal to the roots. In indeed the excess of utility of disobey in hard lockdown is greater than in soft lockdown; that is, if W − K > 0, the sign of the coefficient attached to γ 2 is ambiguous. If it is still positive, the same conditions stated before hold (we have a saddle point); otherwise, we will have two complex eigenvalues with positive real part, which leads the system rotate around this equilibrium, but converging to either one of the two stable equilibria (0,1) or (1,0). Solving Eq. 24 for γ and rearranging, taking into consideration the expression for Z we get the following roots: (1 − y) 2 4·DL W −K − DF DF+DL It is easy to see that if W − K < 0, both roots are real, and the solutions of the inequality lie at the interior of the interval between γ 1 and γ 2 . Moreover, these conditions guarantee that the eigenvalues associated with the Jacobian are real. In particular, in order to have real eigenvalues, the peer effect has to be smaller "enough," that is to say, it has to be included between 0 and the greater between γ 1 and γ 2 , depending on the sign and dimension of K . If these conditions are not met, that is to say, if the peer effect is greater than the threshold, then the eigenvalues are complex with positive real part, and this implies that the system will diverge from the interior equilibrium following an enlarging spiral toward one of the two stable equilibria (0,1) or (1,0).