Measuring the price of anarchy in critical care unit interactions

Abstract Hospital throughput is often studied and optimised in isolation, ignoring the interactions between hospitals. In this paper, critical care unit (CCU) interaction is placed within a game theoretic framework. The methodology involves the use of a normal form game underpinned by a two-dimensional continuous Markov chain. A theorem is given that proves that a Nash Equilibrium exists in pure strategies for the games considered. In the United Kingdom, a variety of utilisation targets are often discussed: aiming to ensure that wards/hospitals operate at a given utilisation value. The effect of these target policies is investigated justifying their use to align the interests of individual hospitals and social welfare. In particular, we identify the lowest value of a utilisation target that aligns these.


Introduction
The effect of state dependent service rates in healthcare has been well studied in isolated hospitals. In Batt and Terwiesch (2012), an empirical study is undertaken and service rate slow down is identified. Furthermore, it is shown that modelling whilst ignoring these state dependent rates leads to errors. This effect is further identified in Chan et al (2014), Kc and Terwiesch (2012) where for example the negative effect on patient outcome is revealed. In Kim et al (2013), Shmueli et al (2003), this is expanded to consider a variety of admission policies in two CCUs; in particular the effect of rejection (due to too high occupancy) is measured. These policies are studied in the setting of a single hospital, and thus, from a game theoretic point of view correspond to rational behaviour (a good game theory reference text is Maschler et al (2013)). In practice, a simplification of rational strategies is managed by policy makers by setting utilisation targets that ensure that hospitals do not run at a level likely to have a high rejection rate (example of these can be seen in Bevan and Hood (2006), Kesavan (2013)).
The aim of this paper is to further investigate the effect of rational policies employed by Hospitals. In particular, the goal is to place this in a game theoretic setting so as to identify whether or not target policies result in uncoordinated behaviour that is damaging for patients.
A critical care unit (CCU), also sometimes referred to as an Intensive Therapy Unit or Intensive Therapy Department, is a special ward that is found in most acute hospitals. It provides intensive care (treatment and monitoring) for people who are critically ill or are in an unstable condition. CCUs face major challenges, on average, 8% of patients are refused admission to a CCU because the Unit is full (Audit Commission, 1999). The CCU occupancy rates for some hospitals are reportedly very high (Mitchell and Grounds, 1995;Smith et al, 1995) and a shortage of beds has been identified throughout the United Kingdom.
Many previous researchers have developed queueing models to help manage bed capacities in hospitals: Cooper and Corcoran (1974), Dumas (1984), Gallivan and Utley (2011), Gorunescu et al (2002), Griffiths et al (2013), Harper and Shahani (2002). Also, a vast amount of literature has been devoted to the simulation of CCUs; Cahill and Render (1999), Costa et al (2003), Griffiths et al (2005), Kim et al (1999), Litvak et al (2008), Shahani et al (2008). This paper describes part of a project undertaken with managers from the Aneurin Bevan University Health Board (ABUHB), which is an NHS Wales organisation in South Wales, that serves 21% of the total population of Wales (Board, 2014). Critical care is delivered on two sites, at the Nevill Hall hospital in Abergavenny and at the Royal Gwent hospital in Newport. For the remainder of this paper, the Nevill Hall hospital will be referred to as NH and the Royal Gwent hospital as RG.
The main proposition of the work requested by the ABUHB was to develop a mathematical model of bed occupancies at the CCUs at RG and NH. After an initial analysis of the data, behavioural aspects became apparent; for example, delaying patients discharge if there was no pressure on CCU beds, or admitting fewer patients if bed occupancy levels were high. As a result of this, a state-dependent queueing model has been developed, which includes the dependency of admission rate on actual occupancy (Williams et al 2015). This state dependent model was applied to both NH and RG separately. It is however obvious that the actions of one CCU impact on the other CCU, as diversion of patients from one CCU to the other sometimes occurs. A pictorial representation of the situation is given in Figure 1.
Most research where game theory is applied in healthcare has mainly concentrated on emergency departments (EDs) and how to deal with diversions of patients and ambulances. In Hagtvedt and Ferguson (2009), cooperative strategies for hospitals are considered, in order to reduce occurrences when ambulances are turned away due to the ED being full. In Deo and Gurvich (2011), a queueing network model of two EDs is proposed to study the network effect of ambulance diversion. Each ED aims to minimise the expected waiting time of its patients (walk-ins and ambulances) and chooses its diversion threshold based on the number of patients at its location. Decentralised decision making in the network is modelled as a non-cooperative game.
Some other work that has not concentrated on EDs, but has been applied to healthcare includes: Knight and Harper (2013), where results concerning the congestion related implications of decisions made by patients when choosing between healthcare facilities were presented. In Howard (2002), a model of the accept/reject decision for transplant organs is developed.
In the wider intersection of game theory and queueing theory (where this work lies), papers that consider price and/or capacity include Allon and Federgruen (2007), Cachon and Harker (2002), Cachon and Zhang (2007), Kalai et al (1992), Levhari and Luski (1978). In these models, the choice of price/ or capacity determines the arrival rate for each firm; this is similar to the approach taken in this paper.
The work presented in this paper contributes to the growing body of literature by applying state-dependent queueing models in a game theoretical context to CCU interaction. In particular, this consideration allows for the investigation of targets imposed by central control (Bevan and Hood, 2006). The findings of this work justify and identify a choice of targets that align the interests of the individual hospitals with social welfare.
The data used for the work presented in this paper were provided by the Intensive Care National Audit and Research Centre (ICNARC) and refer to patients admitted to CCUs in NH and RG, and cover a period of three years, from the 1st January 2009 till the 31st December 2011. The dataset contains information about a patient's source of admission, date and time of admission, date and time of discharge, CCU outcome and delay to discharge. The parameters obtained from the data used in this work are shown in Table 1.
Note that, the inter-arrival and service rates are not state dependent. These will serve as a base level for the state dependent rates used throughout the game theoretic models.
The paper is organised as follows: Section 2 presents the general methodology as well as a theoretical existence condition for Nash Equilibrium. Section 3 presents the findings from two models. Finally, conclusions and further ideas for progression are given in Section 4.

Queueing and game theoretic models
Throughout this paper, it is assumed that both CCUs (NH and RG) act selfishly and rationally. The strategies of each CCU are capacity thresholds at which they declare being in ''diversion.'' When in ''diversion'' the arrival rates of patients are modified. Given the proximity of the two CCUs, one CCU could for example divert their patients to the other. Figure 2 shows a diagrammatic representation of this where k r H for H 2 fNH; RGg and r 2 fðl; lÞ; ðl; hÞ; ðh; lÞ; ðh; hÞg simply denote arrival rates that will be defined for both models considered in Section 3, where r denotes regions with boundaries defined by the capacity thresholds. For example (l, h) denotes a region for which NH experiences low demand and RG experiences high demand. It is also assumed that diverted patients will be treated under the length of stay profile of the CCU they are admitted to. The capacity thresholds are denoted as K H 2 Z for H 2 fNH; RGg. Note that 0 K H c H .
To formally investigate the impact of decentralised decision making, the interaction between two CCUs is placed within a non-cooperative game framework. The interaction will be modelled through a two dimensional continuous Markov chain that will now be described.

Queueing model
The state space for the Markov chain is given by (1).
Divert? Figure 1 Diagrammatic representation of CCU interaction through patient diversion.
For given K NH ; K RG and using the notation of Figure 2, the generic Markov chain used to model the interactive queueing system in this paper is shown in Figure 3.
The stochastic transition rate matrix Q ¼ Qðc NH ; c RG Þ of the continuous-time Markov chain (Stewart 2009) has entries q ij ¼ q ðu i ;v i Þ;ðu j ;v j Þ which is the rate at which a transition from state i to state j occurs. The transition rates are given by (2) and are illustrated diagrammatically in Figure 3.
Utilities will be of interest when this queueing theoretical model will be inserted in the game theoretical model. Throughput of patients is a natural choice of utility given that most hospitals are financially rewarded per served patient (Pate, 2009). For each threshold pair ðK NH ; K RG Þ, the utilisation rate U H and throughput T H can easily be obtained for each CCU: H 2 fNH; RGg, using the following formulas: where P ðHÞ ¼ P ðHÞ ðK NH ; K RG Þ is the steady-state probability distribution function (obtained from the corresponding transi-  Figure 4. For the parameters of Figure 4, the utilisation rates are 59% at NH and 62% at RG and a throughput of 1.23 at NH and 1.98 at RG (patients per day).
For a different threshold pair of ðK NH ; K RG Þ ¼ ð1; 12Þ the steady-state probabilities are given in Figure 5. The utilisation rates are 11% at NH and 67% at RG and throughput of 0.23 at NH and 2.13 at RG. We see that the RG is now busier as a result of NH having a lower diversion threshold. A model of this interaction will be given in the next section.

Game theoretic model
Based on the discussion above, the game theoretic model is presented as a synchronous optimisation problem shown in the following optimisation problem.
Optimisation problem.
For all H 2 fNH; RGg minimise This game is equivalent to a bimatrix game with restriction to pure strategies where both players aim to get their utilisation as close as possible to a certain target. Such a Nash Equilibrium is not guaranteed by traditional game theoretical results (Nash, 1950), which guarantee the existence of equilibria in mixed strategies. Based on discussions with ABUHB, long-term threshold policies are a realistic consideration and so a pure strategy space is used.
The following result is a sufficient condition for the existence of an equilibrium: Theorem Let f H ðkÞ : ½1; c H ! ½1; c H be the best response of player H 2 fNH; RGg to the diversion threshold of is a non-decreasing function in k then the game of (3) has at least one Nash Equilibrium. Proof The function f H is well defined as it maximises a continuous function over a finite discrete set. In case of multiple values that minimise U H , it is assumed that f H returns the lowest such value, this is consistent with the Price of Anarchy (PoA) being a theoretical upper bound of the effect of uncoordinated behaviour (Roughgarden, 2005).
As such if f H is non-decreasing, then it is in fact a stepwise non-decreasing function. If we consider f NH and f RG plotted on the same axis (so that the domain of f NH is the x-axis and the domain of f RG is the y-axis), it is obvious to see that the functions must intersect at some point as shown in Figure 6.
This point of intersection corresponds to a Nash Equilibrium of (3). h This Theorem is in itself not that useful as the properties of f H are difficult to ascertain. Although the methodology alluded to is how the equilibria are found for the work presented here (exhaustive investigation of best response functions). The following Lemma will however be of more use in Section 3.
Lemma Using the convention of Figure 2:   Based on the previous observation this in turn implies: In the same way (illustrated by Figure 8), this gives: Using (4) and (5), the general inequalities associated with U NH ðK NH ; K RG Þ are summarised in Figure 9.
As U NH increases from the lowest value (top left in Figure 9) to the highest (bottom right in Figure 9), the value of U NH À t will change sign (from negative to positive). Let Thus, f AE ðK RG Þ is the value of K NH that gives the value of U NH ðK NH ; K RG Þ that is closest to t such that This immediately (from Figure 9) gives Let Thus, f ðK RG Þ 2 SðK RG Þ. Note that from (5) maxðSðK RG ÞÞ ¼ f þ ðK RG Þ and min SðK RG Þ ¼ f þ ðK RG Þ. In essence SðK RG Þ corresponds to the set of two values of K NH that give a utilisation just below and just above t.
There are two possibilities that will now be considered: which is the required result.
To finish the proof consider which, because it has been assumed that f À ðK RG þ 1Þ ¼ f À ðK RG Þ, contradicts the required result (as this would imply that f ðK RG Þ\f ðK RG þ 1Þ). Figure 7 The effect of the threshold (K RG ) on the effective arrival rate ( k).  Figure 9 Values of utilisation (U NH ) and the corresponding inequalities for varying cutoff thresholds. Recalling equations (4-5) and Figure 9 gives As f ðK RG Þ ¼ f þ ðK RG Þ (this is the assumption made): Similarly as f ðK RG þ 1Þ ¼ f À ðK RG þ 1Þ (this is the assumption made): Combining (17) and (19) contradicts (15) implying that the original assumption was incorrect thus proving the required result. h The aim of the work presented is to measure the inefficiency created by the removal of central control between CCUs.
We let e T denote the sum of throughputs at the Nash Equilibrium obtained by solving the game of (3) (in case of multiple equilibria we take e T to be the lowest throughput) and let T Ã ¼ max KNH;KRG T NH þ T RG ð Þ . This optimal throughput T Ã is independent of t.
Note that intuitively it could be thought that T Ã is obtained when K NH ¼ c NH and K RG ¼ c RG ; however, this is not always the case (numerical experiments have been carried out to verify this).
The measure used to quantify inefficiency is the Price of Anarchy (PoA) (Koutsoupias and Papadimitriou, 1999;Roughgarden, 2005), which is the ratio of the social optimum welfare to the welfare of the worst Nash Equilibrium. That is, the ratio of the largest social welfare, T Ã , to the smallest social welfare, e T , achieved at any Nash Equilibrium. Thus, Note that the classic definition of PoA has been modified here to allow for a maximisation problem. Social welfare is here considered to be a maximisation of throughput. An immediate alignment of interests can be obtained by setting t ¼ 1. This however would not be in the interest of the hospital (nor necessarily in the interests of patients) as it would imply aiming to run at 100% utilisation which imply a large quantity of patients being rejected. A sensible value of t is the lowest value of t that ensures a low PoA.

Results
The game theoretic model of (3) is solved using exhaustive consideration of best responses whilst taking advantage of the structure identified by the Lemma of Section 2. For any given pair of threshold strategies, the matrix equation pQ ¼ 0 is solved by obtaining a basis for the kernel of Q. For the purpose of this paper, this is implemented in Sagemath (Stein et al, 2013).

Model 1: Strict diversion
This model assumes that if the bed occupancy level at both CCUs exceeds a predetermined threshold, then the admission to the CCU is cancelled. This cancellation could correspond to sending the patient to a completely different CCUs (outside of the model), moving one of the current CCU patients (ready to be discharged) to another ward and/or using the post-anaesthesia care unit as a temporary overflow measure. This model corresponds to the first possibility: the patient is lost (from the point of view of this model).
Recalling Figure 2  We immediately see that the Lemma of Section 2 holds and so a Nash Equilibrium for our model exists.
If either CCU chooses their threshold at zero, patients are not admitted at all, and, consequently both Units are closed.
Therefore, the matrix Q has entries q ij as follows: For the parameters of Table 1, the best responses are shown in Figure 10. For example, in Figure 10a if RG chooses K RG ¼ 6, NH has best response K NH ¼ 8 . Similarly, if K NH ¼ 2, RG has best response K RG ¼ 15. A Nash Equilibrium for our game is a pair of points that intersect. For this model, the Nash Equilibrium is at (8, 16), which gives e T ¼ 3:65 and we obtain T Ã ¼ 3:65. Importantly, a PoA of 1 is not guaranteed for this problem. For example in Figure 10b similar best response behaviour is shown for t ¼ 0:6 for which the PoA ¼ 1:18 (the optimal throughput is again at (8, 16)).
Whilst removing central control, a certain influence can be exerted by a choice of t. Figure 11 (note: the non linear scale) and Table 2 show the effect of t and overall demand. We modify the demand rate from Table 1 by taking k H k H ð1 þ xÞ for À0:9 x 2.
We see that an extremely large PoA is obtained for t\0:2. For values of t [ 0:5 the PoA is still high: a PoA of 2 corresponds to 100% less throughput of patients. These findings seem to give some backing to the targets implemented throughout the NHS (Bevan and Hood, 2006).
In particular, it can be seen that a value of t [ 0:8 becomes imperative for high demand. The lowest value of t which gives PoA ¼ 1 for the actual demand levels (x ¼ 0) is in fact t ¼ 0:72. It is also noted that as demand increases the effect of uncoordinated behaviour increases (and the recommended target also increases) as shown in Figure 12. This is potentially due to the fact that as demand increases there is the scope for larger discrepancy between optimal and suboptimal behaviours.
In this model, there is the potential for both CCUs to divert patients at the same time, and so patients are lost to the entire system. The model of the next section will investigate the effect of not allowing total rejections.
0Þ and u i \K NH and v i \K RG ; k RG ifðu i ; v i Þ À ðu j ; v j Þ ¼ ð0; À1Þ and u i \K NH and v i \K RG ; k NH þ k RG if ðu i ; v i Þ À ðu j ; v j Þ ¼ ðÀ1; 0Þ and u i \K NH and v i ! K RG or ðu i ; v i Þ À ðu j ; v j Þ ¼ ð0; À1Þ and u i ! K NH and v i \K RG ; 0 otherwise:   We immediately see that the Lemma of Section 2 holds and so a Nash Equilibrium for our model exists. This means that if bed occupancy levels at both Units exceed a pre-determined threshold, then diversions do not occur and each CCU has to accommodate their own patients. In effect we are modelling a certain level of cooperation in this case where CCUs only divert if the other CCU is not busy.   Therefore, the transition matrix Q is obtained from the following transition rates q ij : As before, Figure 13 and Table 3 present the PoA for different target values and demand rate changes.
We immediately note that the underlying cooperation that is now being forced on our players (divert only if the other player can accommodate the patients) has reduced the PoA. Note that PoA ¼ 1:02 still implies a reduced throughput of 2% which has very large cost implications for a national health service. A tipping point is now visible as demand increases; this is similar to the profiles shown in Knight and Harper (2013) and can be explained as follows: Also, for very low values of demand, cooperation can be obtained with no target. When the demand is low, there is no scope for uncoordinated behaviour to be damaging. When the demand is very high, the system is saturated and once again uncoordinated behaviour has no negative effect in comparison to optimal behaviour. There is however a region of demand for which there is a high PoA.
For example, for t ¼ 0:8 the PoA starts to rapidly increase for demand changes higher than 0.1, and starts to decrease for a demand change of 0.6; this region will be investigated closely. Table 4 presents results for t ¼ 0:8 and a demand q ðui;viÞ;ðuj;vjÞ ¼   change from 0.1 to 0.6. For a 50% increase in demand, without a matching increase in capacity, rational behaviour of CCUs would incur 6% less patient throughput. Clearly, as the demand change increases, the Nash Equilibrium thresholds decrease. This is due to the fact that both CCUs are attempting to divert their patients in less busy states as these states become rarer. If one CCU diverts early, the other will follow suit (both CCUs incrementally reacting to each other). As a result the Nash Equilibrium for x ¼ 0:6 is at (0, 0), meaning that each CCU takes care of their own patients. As the demand increases even further the Nash Equilibrium remains at (0, 0) and the PoA decreases. Figure 14 shows the lowest values of t which gives a PoA of 1. We see that as demand increases this value also increases. Also, for very low values of demand, cooperation can be obtained with no target. For the actual demand (x ¼ 0) a target value of t ¼ 0:72 is once again recommended.

Conclusions
In this work, a generic game theoretical model has been presented that accounts for the rational actions of two CCUs. This game theoretic model is underpinned by a queueing model that takes into account the stochastic nature of queueing systems. This work extends the application of game theoretic models already present in the literature to healthcare (Li et al, 2002;Xie and Ai, 2006).
A result is proved that allows for the assertion of existence of a Nash Equilibrium. This result is then applied to two particular models that are influenced by discussions with a local health board. Strict diversion: patients can be lost to the system if both CCUs declare being in diversion. Soft diversion: if both CCUs are in diversion then they cannot divert their own patients.
An analysis of the effect of rational behaviour is given for both of these models in the form of PoA calculations. The PoA is calculated so as to measure the effect of rational behaviour on overall patient throughput. The PoA represents a theoretical lower bound for the potential damages caused by uncoordinated behaviour. High PoAs are found in the case of strict diversion which is to be expected as soft diversion implies a certain level of cooperation. Importantly, a nonnegligible effect of rational behaviour is calculated for certain policy target values. A recommendation of setting t ¼ 0:72 is found across both models. This gives some evidence to a particular target value of maximal utilisation in a two CCU ward setting. This value of t is investigated against increasing demand and is shown to be increasing in overall demand across the system. Investigating demand is akin to investigating the capacity of the CCUs and as shown in Figure 15: if capacity is not sufficient, rational behaviour can have a very damaging effect on overall patient throughput.
It is vital to acknowledge the limitations of the work presented: • The assumptions as to the strategy space of our players is restrictive: a single threshold policy might not be optimal (although it is present in various pieces of literature on optimal control of queueing systems: Naor, 1969, Shone et al, 2013. Indeed, in reality critical care managers could have far more complex boundaries for their heuristic decision making; • This model only assumes the presence of two players; however in reality the system has a variety of stakeholders. Multiplayer systems could be worth considering. This would reflect health boards/hospitals in a concentrated area so that interactions are not just between two hospitals but between many.
• The restriction to pure strategies is influenced by discussions with ABUHB and also does not detract from the results presented thanks to the Theorem and Lemma of Section 2. However, allowing for mixed strategies could also be of interest, corresponding to decision making that is not constant over time. Managers could alternate between a variety of behaviours: at time accepting patient despite being busy and at other times not; • Patient length of stay is assumed to be dependent on the CCU at which they receive service. A further extension of the work would be to use the service rate from original CCU (prior to diversion). This would require a Markov chain with a higher dimensional state space. In practice this corresponds to patient morbidity corresponding to the original CCU.
Despite these mathematical limitations, the work presented here gives a strong analytical evidence as to the use of policies in a decentralised healthcare environment. As discussed above, reducing the decision making of critical care managers to rational reactions to capacity targets is not without limitations. However, this quantitative model of behaviour was described as insightful and informative by ABUHB. In practice, stakeholders describe a target of 80% capacity. Whilst this is not only at times impossible, it is also not evidence based and in particular does not take in to account interactions between CCUs. This is a common theme in practice and the literature which this manuscript aims to address.
Finally, congestion and throughput are not the only concern of a healthcare system. Further work could involve the investigation of patient survival instead of throughput as utility. This would be similar to work such as Erkut et al (2008), Knight et al (2012).