A Game-Theoretical Winner and Loser Model of Dominance Hierarchy Formation

Many animals spend large parts of their lives in groups. Within such groups, they need to find efficient ways of dividing available resources between them. This is often achieved by means of a dominance hierarchy, which in its most extreme linear form allocates a strict priority order to the individuals. Once a hierarchy is formed, it is often stable over long periods, but the formation of hierarchies among individuals with little or no knowledge of each other can involve aggressive contests. The outcome of such contests can have significant effects on later contests, with previous winners more likely to win (winner effects) and previous losers more likely to lose (loser effects). This scenario has been modelled by a number of authors, in particular by Dugatkin. In his model, individuals engage in aggressive contests if the assessment of their fighting ability relative to their opponent is above a threshold \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document}θ. Here we present a model where each individual can choose its own value \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document}θ. This enables us to address questions such as how aggressive should individuals be in order to take up one of the first places in the hierarchy? We find that a unique strategy evolves, as opposed to a mixture of strategies. Thus, in any scenario there exists a unique best level of aggression, and individuals should not switch between strategies. We find that for optimal strategy choice, the hierarchy forms quickly, after which there are no mutually aggressive contests.


Introduction
Very often, animals that share the same territory engage in pairwise aggressive interactions leading to the formation of dominance hierarchies Hand (1986). Here we are interested in groups of animals that are meeting for the first time and have to engage in these aggressive interactions in order to divide their resources. In smaller groups of animals, the hierarchy tends to be linear where one individuals dominates all of the others, a second individual dominates all of the others in the group except the top-ranked individual and so on (see Addison and Simmel 1970; Barkan and Strahl 1986;Goessmann et al. 2000;Wilson 1971). In larger groups of animals, the hierarchy is more complex, where the position of especially lower-ranked individuals may be unclear, e.g. in chimpanzees, baboons, hyenas (Kummer 1984;Möller et al. 2006Möller et al. , 2001Widdig et al. 2001). Some animals are more aggressive than others, and the level of aggressiveness depends upon many factors such as experience, the value of winning the contest and resource holding potential (RHP) (see e.g. Blanchard et al. 1988;Blanchard and Blanchard 1977;Moss et al. 1994;Takahashi and Lore 1983;Taylor 1982).In our model, RHP is simply the ability of an individual to win an escalated contest (Parker 1974), abstracted away from any particular causal effect. In reality, there are a large number of elements that determine the RHP. Very broadly, these elements can be divided into physical attributes, such as size, age and physical strength (intrinsic factors), and psychological attributes, such as prior experience (extrinsic factors).
In more detail, there are a lot of results demonstrating a strong correlation between RHP and body size (Alexander 1961;Bridge et al. 2000;Lindström 1992). For example, it has been observed that larger animals are more aggressive towards smaller ones and that they have more chances of winning an encounter (Frey and Miller 1972;Knights 1987). However, other results show that such physical attributes are not the only important determinant of RHP. For example, Brown et al. (2006) showed that 37.5 % of the group in house crickets won aggressive interactions, even though they had smaller body size. In Hofmann and Schildberger (2001), bigger individuals lost 30 % of the aggressive interactions.
Prior experience as well can have an important effect on the RHP of an individual. For example, if an individual has won more fights than it has lost in the past, it may increase its potential to win in the future.
The aim of this paper is to explore the relationship between extrinsic factors, in particular prior experience, and hierarchy formation. Therefore, we assume in our model that all individuals have identical physical abilities, so that the outcome of an encounter is significantly determined by past experience (although our results depend upon only a mechanical updating of RHP after a contest, so it would allow for real physical as well as psychological changes, too). In particular, we consider so-called winner and loser effects. The winner effect occurs when winning a previous contest increases the chances that an animal wins a subsequent contest. The loser effect occurs when a previous loss similarly increases the chances of defeat in the next contest.
A number of authors have analysed the influence of winner and loser effects on dominance hierarchy formation (e.g. Bonabeau et al. 1999;Dugatkin 1997;Dugatkin and Dugatkin 2007;Hemelrijk 2000). The first models were developed by Landau (Landau 1951a, b). He demonstrated the importance of the winner and loser effects: only when extrinsic factors were considered in addition to intrinsic ones, did the resulting hierarchies resemble those found in nature. Landau considered populations where winner and loser effects were operating together, but there is evidence that some groups of animals experience either winner or loser effects only (Bakker et al. 1989;Bergman et al. 2003;Lindquist and Chase 2009;Schuett 1997). Dugatkin and Dugatkin (Dugatkin 1997;Dugatkin and Dugatkin 2007) developed a model where these effects were considered in isolation, or both to be present in a group of 4 individuals. In Dugatkin (1997), each individual could only knew its own RHP after a win or loss, but they did not have any information about their opponents strength, except at time t = 1. He predicted that when only the winner effect is at play, the emerging dominance hierarchies are linear and the strength of the winner effect is not important. Contrary, when only the loser effect is present, hierarchies where only the top-ranked individual is determined are found (the positions of the rest of the group stay unclear). When both winner and loser effects are present, nonlinear hierarchies emerged where only the first place, and sometimes the second place, was clear in the group. In Dugatkin and Dugatkin (2007), each individual was aware of their own RHP and they could make an imperfect estimate about their opponent's RHP at each point in time. He concluded that overestimating or underestimating the opponent's strength does not have any influence on linearity: in both cases, linear dominance hierarchies were established.
In Kura et al. (2015), we analysed the temporal dynamic and the average behaviour of dominance hierarchy formation for different combinations of winner and loser effects, using the model developed by Dugatkin (1997). We concluded that it is not necessary for a group of individuals to have perfect knowledge of each other's RHP in order to establish a linear dominance hierarchy; only a little information about the current RHP estimation of an individual's opponent is enough to establish a linear dominance hierarchy. We used different statistical measures such as the overlap between the distribution of the RHP of each individual over time to check for distinguishability between a pair of individuals. The index of linearity was used to measures how far from linearity each hierarchy is. Furthermore, we considered the question of how many fights are needed for a dominance hierarchy to be established, and we found that this number is relatively low.
In Dugatkin (1997) and Dugatkin and Dugatkin (2007) (as well as Kura et al. 2015), each individual had the same fixed level of aggression; they would retreat for the same excess of the number of wins over the number of losses. In this paper, we introduce game-theoretical elements in the form of aggressiveness level into this model. We assume that each individual can choose its own strategy, independent of their opponent's strategy. We are particularly interested in determining the appropriate level of the aggression threshold and exploring whether a unique strategy, or mixture of strategies, emerges in the population considered. Our model set-up allows us to answer questions such as under what circumstances should an individual fight more in order to establish a higher rank in the hierarchy and when should it retreat? We use a framework similar to the Hawk-Dove model Maynard Smith (1982), where an individual can choose to either fight or concede, with each individual making its choice simultaneously. When two individuals choose to fight, they engage in an aggressive interaction; the winner will increase its RHP by a factor 1 + V 1 , and the loser will reduce its RHP by a factor 1 − C 1 . When one individual fights and the other concedes, the individual that chooses to fight increases its RHP by a factor 1 + V 2 and the retreating individual has its RHP reduced by a factor 1 − C 2 . In the case when both individuals retreat, they have their RHP multiplied by 1 − C 2 . Individuals choose their own strategies, meaning whether to fight and or to concede in an aggressive interaction given their history of fights won and lost, from a range of possible strategies. For each of these possible strategies, we will determine the resulting expected payoff and conclude whether the chosen strategy is beneficial to the individual or not. We will analyse two cases: when each individuals choose a strategy that enables them to fight in all interactions, and when they choose strategies that enable them to fight until a certain point in time (based upon how many contests they have won or lost) and retreat afterwards. We will determine the evolutionarily stable strategies (ESSs) for this fighting game, where an ESS is a strategy that when played by almost all members of the population cannot be invaded by any other strategy. We will also calculate the possible stopping times of the game for different strategies and analyse the relationship between the stopping time and the difference of the number of wins and losses for an individual.
As explained above, individuals fight for more access to resources and we will investigate the effects of different payoff functions on the ESSs within our model. In particular, we compare payoffs which depend upon the level of resource an individual receives to those which depend upon the proportion of the overall resource that it receives. The latter payoff function is particularly appropriate when resources are scarce. Once the dominance hierarchy is established, it is easier for the group to divide resources between them: the higher the position in the hierarchy, the higher the payoff. The division of resources has been analysed by different authors (see e.g. Broom and Ruxton 2001;Keller and Reeve 1994). We will use the concept of reproductive skew (Broom et al. 2009;Keller and Reeve 1994;Reeve and Keller 2001;Shen and Reeve 2010;Vehrencamp 1983), which refers to the distribution of reproductive rights in a group of animals. We will use the term more generally to refer to how limited resources, and hence, payoffs (which are generally proportional to reproductive levels in evolutionary games) are divided among our group. When the reproductive skew is high, the division of resources is uneven with the high-ranking individuals obtaining more resources than the lower-ranking ones (for example, see Drews 1993; Monnin and Ratnieks 1999;Rood 1980). In contrast, if the reproductive skew is low, the division of resources is even and all ranks of individuals have similar resource levels (see Brown 2014;Mangold et al. 2015). Further, we will explore the interplay between all three game-theoretical elements, V i , C i and strategies θ x , and analyse whether there is a general pattern for the ESS when the V i and C i are increased (or decreased). Additionally, we develop a simulation framework to investigate the effect of the group size on the level of aggression. We note that Andersen et al. (2004) developed an alternative optimisation-based model to analyse the effect of group size on aggression level and showed that the theoretical results obtained are supported by experimental data observed in domesticated pigs; we discuss this in Sect. 6. Lastly, we compare our theoretical results with experimental evidence which is rather different for different groups of animals such as birds, farmed animals or fish (see e.g. Andersen et al. 2004;Bilčık and Keeling 2000;Estévez et al. 1997;Estevez et al. 2007;Kotrschal et al. 1993;Nicol et al. 1999;Syarifuddin and Kramer 1996;Turner et al. 2001).

The Model
We assume a large population of social individuals living together in groups. At the beginning of the consideration, groups of size N are randomly formed, so that all individuals are members of a group and we analyse a specific group of N individuals. Each individual has an RHP value, which, as mentioned in the Introduction, is a measure of its ability to win an aggressive interaction (cf. Dugatkin 1997;Dugatkin and Dugatkin 2007) and which is altered by the outcome of each interaction. At the beginning, all individuals are assigned the same initial RHP, denoted by RHP initial . We assume that all individuals know their own RHP and that of any opponent. In each round t (t = 1, ..., T ), two individuals are randomly chosen to engage in an aggressive interaction, while the rest of individuals do not engage in any aggressive interactions. Through time, an individual's RHP changes due to winning or losing (in reality, it will be mainly the extrinsic factors than change, but our model could cope with other eventualities equally well), while a win increases the RHP, a loss decreases it and each individual keeps track of the changes in their own RHP and that of its opponents. More precisely, suppose that at time t the two individuals pitted against each other are x and y. We denote by RHP x,t individual x's RHP at time t. Individual x can decide to be aggressive or retreat once it has been chosen and this decision is based on the strategy θ x ≥ 0 which is its aggression threshold.
Individual x fights individual y at this time (plays Hawk) if holds, otherwise it will retreat (play Dove), where RHP y,t and θ y are the individual's y RHP assessment score at time t and its aggression threshold, respectively. From the pairwise interaction, we get one of the following outcomes: 1. Both individuals x and y decide to engage in an aggressive interaction and the probability that x wins is given by and consequently, individual y wins with a probability P y,x (t) = 1 − P x,y (t). 2. One individual engages in the aggressive interaction and the other retreats. 3. Both individuals decide not to fight (which is known as a double kowtow).
After a win, the RHP increases, and after a loss, it decreases. More precisely, if individual x wins and individual y loses, then they increase and decrease, respectively, their own RHP as follows: If individual x wins and individual y retreats, then they increase and decrease, respectively, their own RHP as follows: Equivalent changes to the RHPs apply if individual y wins. If both individuals retreat (double kowtow), then they decrease their RHPs as follows: In this model, V 1 , V 2 are proportional increases in RHP and C 1 , C 2 are proportional decrease in RHP where V 1 , V 2 ≥ 0 and C 1 , C 2 ∈ [0, 1] The aim of each member of the population is to maximise its payoff at time T . In the following, we assume that the payoff function is defined as the natural logarithm of the RHP (which corresponds to the situation of unlimited resources) but consider in Sect. 3.5 the effects of an alternative payoff function (which corresponds to the situation of limited resources). Now there are two main reasons for considering the natural logarithm of the RHP. Firstly, while we want to keep to Dugatkin's terminology as much as possible, the multiplicative nature of how the RHP increases means that RHP values can become large very quickly. If we would assume the expected RHP as the payoff, then even a minuscule chance of winning enough contests to be the top individual would be worth almost any risk. Considering the logarithm means that winning (losing) any contest increases (decreases) the payoff by the same amount irrespective of the current RHP, which seems reasonable. Secondly, taking the natural logarithm of the RHP guarantees that the payoffs increase in precisely the same way as in evolutionary matrix games, and in particular the Hawk-Dove game, which we use as an analogy in this paper.
This model set-up allows us to track the changes in RHP of all N individuals at the time points t = 1, . . . , T and therefore to evaluate which strategy θ results in the highest payoff over time. In this context, the ESS introduced by Maynard (1974) proves to be an important concept. An ESS is a strategy, that if adopted by a population, cannot be invaded by any other rare strategy. In general, we can have more than one ESS. In an N -player game, strategy θ x is an ESS if either: is the expected payoff of an individual playing strategy θ x against i individuals playing strategy θ x and N − i − 1 individuals playing strategy θ y , respectively Broom et al. (1997).
For Sect. 3, where we consider two-player games only, the ESS definition reduces to: is the expected payoff of individual x against individual y with strategies θ x and θ y , respectively.

The Two-Individual Model
For simplicity, in this section we consider groups of two individuals only. This will allow us to find some analytical results which will give us general insights into the dynamic of our model. We will then generalise to larger groups in Sect. 4.

Expected Payoffs When Players Always Fight (θ x = θ y = 0)
We assume that both individuals, denoted by x and y, possess the same RHP initial values. Further, individuals x and y play the strategies θ x = θ y = 0, meaning that both individuals will fight until time T (cf. Eq. 1). In this section and throughout the paper, we assume V 1 = V 2 = V , C 1 = C, C 2 = 0. This implies that winning a fight and having your opponent retreat has the same effect on the RHP. But contrary to Dugatkin (1997), we do not assume that losing a fight and retreating has the same effect on the RHP. This seems plausible as it is similar to the Hawk-Dove model to which we refer, in the sense that the loss of a fight is like an injury (whether a real injury or a psychological one). Figure 1 illustrates the possible RHP values of individual x at times t = 1 and t = 2. For example, the expected payoff of individual (1-C) (1-C)

Fig. 1 RHP of individual
x and individual y at times t = 1 and t = 2 when they both start with the same RHP initial and always fight (θ x = θ y = 0) An individual either wins or loses a fight, and we denote a win (loss) in the kth contest by j k = 1 ( j k = 0). Thus, at time t individual x has a t wins and b t losses which are given as follows: The RHP for individual x, having won a t contests and lost b t , will be denoted by R a t ,b t and is given by [cf. equations (3) and (4)] The probability of winning after a t wins and b t losses at time t will be denoted by W a t ,b t , whereas the probability of losing will be denoted by If we consider all combinations of wins and losses and consider ln(RHP), then the overall expected payoff is given by where a T and b T are given by equations (9) and (10).

Individuals with General Strategies θ x and θ y
In this section, we analyse the expected payoffs for individuals x and y when they have potentially nonzero and different strategies θ x and θ y , respectively. We start by deriving a general criterion for the number of losses necessary so that an individual retreats. Suppose that at time t individual x has won a t contests against individual y and lost b t . Then, its RHP will be RHP x,t = R a t ,b t . In contrast, individual y has won b t contests and lost a t against individual x resulting in a RHP of R H P y,t = R b t ,a t . Thus, from equations (3)-(6) we obtain: The next interaction between the individuals x and y will result in a fight if equation (1) holds for both individuals. In other words, the following two equations have to be satisfied simultaneously and Next, we take the logarithm of equations (12) and (13) on both sides and obtain and We define and where d x and d y are both positive numbers for any pair of individuals which do not concede immediately. As equations (14) and (15) have to be fullfilled simultaneously, This means that if the excess of the number of wins over the number of losses is , individuals x and y will engage in a fight. If both individuals start by fighting and the first condition to not hold is a t − b t ≤ d y , then we have a case where individual y decides to retreat and individual x to fight. After retreating for the first time, an individual then retreats in every contest until time T . Consequently, after y has retreated, individual x increases its RHP for every contest. By contrast, if the first condition to not hold is −d x ≤ a t − b t , then individual x decides to retreat and individual y increases its RHP for every contest. The situation where both individuals retreat only occurs if this happens at t = 1. We define the time when individual x retreats by Then, the expected payoff E[ln (RHP x,T )] at time T is given by: where is the multiplicative increase in RHP that individual x gets after the stopping time T s . It follows from inequality (18) and the fact that a t − b t is an integer that all θ values within a certain interval result in the same expected payoff (for fixed V and C). We denote those intervals of strategy values by where θ x,sup is the value of θ x that corresponds to d x and θ x,min the value of θ x that corresponds to d x . The intervals are closed at the lower bound and open at the upper bound and θ x,min < θ x,sup . We set and obtain Further, we set k x = d x . The corresponding strategy value θ x for k x is θ x,min and we have which results in Similarly to the above, for given V and C there is a range of θ values that correspond to a given k. Importantly, each strategy θ from that range results in the same payoff. We note, however, that this range changes for different V and C. For simplicity, we shall assume that individual x chooses the middle value from [θ x,min , θ x,sup ), and this strategy will be denoted by θ x,rep as the representative strategy of the [θ x,min , θ x,sup ) range

Stopping Time T s
The expected payoff E[ln(RHP x,T )] given by equation (21) depends on the stopping time T s . In this section, we explore the properties of T s as defined by equation (20), in particular its distribution.
To do so, we firstly determine the values of k x and k y for individuals x and y with strategies θ x and θ y , respectively. The time when the random process a t −b t is equal to k x or k y represents the stopping time. For instance, individual x would not engage in aggressive interactions when a t − b t ≤ −k x and the stopping time defined in equation 19 can be written alternatively as But which values can the stopping time T s (x) assume? The earliest possible x-stopping time is T = k x , i.e. individual has k x consecutive wins from the start of the interaction. The next possible stopping time will be at k x + 2, where a single win by individual x within the first k x interactions has to be met by a total of k x + 1 wins by y. In general, the stopping times for individual x will be given by k x + (2n) n≥0 . Consequently, the stopping times for individual y will be given by k y + (2n) n≥0 . Thus, T s = min{T s (x), T s (y)} can assume the following values : k x + k y even In summary, the stopping time defines the exact time when one individual starts to retreat for different strategy combinations. It also gives the number of possible interactions that need to be observed in order to distinguish between a pair of individuals, so that in our model the second individual will always concede to the first (for a different interpretation of this concept, see Kura et al. 2015).
Note that it is possible for our model to generate one experience, a winner effect or a loser effect, without the other. For example, for V > 0 and C = 0 we have a case when only the winner effect is in place. Tables 5 and 6 show the expected payoffs for different strategic values when V = 0.1 and C = 0. On the other hand, when C > 0 and V = 0, illustrated by Tables 7 and 8, we have a case when only the loser effect is operating.
In the next section, we derive the distribution of T s for the parameter constellation V = C = 0.1 (both winner and loser effect are influencing RHP).

Example:
To illustrate the findings of the last sections, we consider an example by assuming the parameters V = 0.1, C = 0.1 and T = 20. In particular, we calculate the expected payoffs E[ln (RHP x,20 )] for different combinations of strategies θ x and θ y , determine the unique ESS and derive the distribution of the stopping time T s .In this section and throughout the paper, we will assume that RHP initial = 10.
Firstly, we determine the representative strategies to k x = 1, 2, 3, 4, 5, 6, 7, 8 by using equation (24). Note that there is a range of strategies θ x that correspond to the same value of k x and we take the middle one as described in Sect. 3.2. We obtain the following mappings (the same values apply for individual y as well).
For this set of strategies, we then calculate the expected payoffs E[ln (RHP x,20 )] for individual x and E[ln (RHP y,20 )] for individual y by using equation (21). Table 1 represent the matrix of payoffs for different combinations of strategies θ x and θ y . Now for each strategy, we can find the best response, i.e. for each column of Table  1 we find the highest payoff and use the "diagonal rule "to find the ESS. The diagonal rule states that if any value on the diagonal of the matrix of payoffs is larger than all the values in the same column, then the corresponding pure strategy is an ESS. We note that for a pure ESS, all our results satisfy ESS condition 1 ; condition 2 is only achieved when mixtures are present, which we do not get in our example. In this example, we obtain θ = 0.61, corresponding to k = 3, as the unique ESS. Note that there is a range of strategies [θ x,min , θ x,sup ) = [0.55, 0.67] that corresponds to k = 3. Thus, any strategy from this range results in the same expected payoff and is therefore equivalent to our ESS. Lastly, we derive the distribution of the stopping time T s . For example, when θ x = 0.5 (corresponding to k x = 4) and θ y = 0.7 (corresponding to k y = 2), T s can only assume the values (k y + 2n) n≥0 because k x + k y = 6 is an even number [see equation (26)]. But how does this distribution change when k x and k y are varied? To explore this, we assume that individual x has a strategy θ x corresponding to k x = 1, 2, 3 and his opponent has strategies θ y corresponding to k y ∈ [1, 8]. We choose the value 8 as an upper bound for k y as an arbitrary large cut-off value which corresponds to small values of θ , but we could have chosen any other high value. Figure  2 shows the distribution functions of the stopping time for various combinations of k x and k y for V = C = 0.1.
Figures 2 illustrates that a pair of individuals will fight longer for higher values of k x and k y . The reason behind this is that larger values of k correspond to smaller strategy values θ , and hence, equation (1) implies that the individuals will fight longer. In this example, one of the individuals x and y has started retreating before time T , for most of the possible cases. This means that observing 20 interaction would allow us to distinguish between the two individuals almost with certainty. As we increase the values of k x and k y , the probability of retreating before T = 20 is decreased. Table 1 shows the expected payoff of individuals x and y after T max = 20 possible interactions using equation (21). In this section, we explore how limited resources are divided between the two individuals based on an alternative payoff function. We will use the concept of reproductive skew as discussed in Broom et al. (2009), Keller and Reeve (1994), Reeve and Keller (2001), Shen and Reeve (2010), Vehrencamp (1983). In this case, the expected payoff for individual x after 20 interactions is given by function:

An Alternative Payoff Function
Consequently, the expected payoff for individual y is given by function .
The results are given in Table 2.
From Table 2, we find that θ = 0.4 (corresponding to k = 5) is the ESS. Comparing this result with the result obtained from Table 1, we notice that they differ; when using this alternative payoff function, we obtain k = 5 as the ESS, while for the original payoff function used in Sect. 3.4, the ESS is k = 3. This differences are related to the amount of the available resources, in particular whether they are plentiful or limited. We assume that for plentiful resources, the absolute RHP is more important, but for scarce resources shared between group members, the relative RHP is the key element. If an individual needs to maximise the RHP, then it should fight less compared to the situation where it needs to maximise the division of limited resources. In this latter case, the individual needs to be more aggressive so that it can win a greater share than its opponent, since "hurting" its opponent leads directly to improving its proportion in equation (27).

How the Expected Payoffs and the Division of Resources Change When
Varying V and C In this section, we will vary the values of V and fix the value of C (C = 0.1), noting that different combinations of V and C correspond to different values of k for any given value of θ . For each of these combinations, we find the ESS (θ and the corresponding k ) when ln(RHP) is considered as the payoff function and when the alternative payoff function is used. The results are summarised in Figs. 3 and 4 where we plot the ratio V C with C = 0.1 on the x-axis and the best strategy on the y-axis (optimal k in Fig. 3 and best θ in Fig. 4). For the case when V = 0 and C > 0, we expect the ESS to be the strategy where an individual retreats immediately. This is true when ln(RHP) is considered as the payoff Table 2 Division of resources for different values of k,  The evolutionarily stable strategy θ for variable V and fixed C (C = 0.1) for ln(RHP) and alternative payoff function. When C = 0, the ESS will be the highest possible value of k (C → 0 ⇒ k → ∞) function. When the alternative payoff function is used, we obtain k = 1(θ = 1) as the ESS (for C = 0.1). Thus, in this case it is best to fight initially to potentially reduce the RHP of the opponent, as this increases the individual's payoff function. On the other hand for C = 0 and V > 0, we obtain k → ∞ as the ESS. This is the expected result as since there is no cost for losing, it is best to fight until the end of the competition. When V C ≤ 4, we obtain lower values of θ as an ESS for the alternative payoff function than for the payoff function given by ln(R H P). This means that when resources are scarce, individuals need to be more aggressive in order to get a high payoff. For sufficiently high V C ratio, (e.g. for V C > 4), we obtain the same value of θ as an ESS for both payoff functions. The corresponding tables showing the expected payoffs for different combinations of k x and k y when V and C vary are given in Appendix.

The N-Individual Model
In Sect. 3, we demonstrated how the expected payoff can be derived analytically for the situation of two interacting individuals. Generalisations of these results to situations with more than two individuals, however, have proven to be analytically intractable. To nevertheless gain insights into the behaviour of larger groups, we develop a simulation approach which determines the ESS for N interacting individuals. We imagine a population of 10,000N individuals, which at the start of the game is divided into 10,000 groups of size N at random. Members within each group interact as previously described, for a total of 200 contests, and record their payoff (this correspond to steps S1-S2.3 ). The individuals then produce offspring proportional to their payoff to form a new generation of 10, 000N individuals. This process is repeated for 10, 000 generations (this corresponds to step S3). The algorithm which generates our approach is defined as follows. Set j = j + 1. If j <10,000 go to S2.0 otherwise to S3. S3 Update probability function p(θ = θ k ) as follows .
Set i = i + 1. If i <10,000 go to S2.0 otherwise the simulation is finished.
The outcome of this algorithm is the probability vector p(θ = θ k ), and in most cases, the probability mass will be concentrated in a single strategy θ k which represents the ESS. When this is not the case, the mean value of the strategies at the end of the simulation (i.e. after 10,000 generations) will be considered as the ESS. In order to analyse the accuracy of the simulation algorithm, we consider the same parameter constellation as in Sect. 3.4, namely N = 2 and V = C = 0.1, and determine the ESS. We obtain p(θ = 0.6) = 1 and conclude that θ = 0.6 is the ESS, which falls within the [0.55, 0.67] range; the result that we obtained from equation (21). We considered other values of V and C as well, and in all situations, analytical and simulation results coincided.  Now we consider a group of N = 4 individuals and use the simulation algorithm described above to determine the ESSs. We do this for different combinations of V and C, and the results are shown in Table 3 and Fig. 5.
The ESS values show that when the value of C is increased for a fixed value of V , the value of θ is also increased. This means that the individuals fight less as the cost of injury, for example, is increased. On the other hand, when V is increased for a fixed C, we notice that the value of θ is decreased, and thus, individuals are fighting longer. If V = C, then the value of the ESS decreases when V and C are simultaneously increased by the same factor. This is supported by the results of V = C = 0.05, V = C = 0.1 and V = C = 0.15 which have respective ESSs 0.6, 0.49 and 0.45. For N = 2, there is a range of strategies θ that correspond to the same critical value of the excess number of defeats k leading to concession. This range is determined by (23) Next we compare the ESSs when we increase the group size from 2 to 4 individuals. In Table 4, we show the values of the ESS for these two group sizes for some combinations of V and C. We conclude that as the group size is increased the values of strategies θ are also increased. This implies less aggressiveness in larger groups. Hence, in larger group sizes it is best to fight less than it is in smaller populations, because an individual will suffer a larger loss in RHP for fighting longer and potentially losing against three individuals.

Comparison of Strategies
In the above sections, we have derived how the ESS for different values of C and V can be calculated. Now we explore whether the knowledge about the ESS in a specific situation characterised by V and C allows us to infer the ESS for a related situation with αV and αC (for sufficiently small α). Similarly to the Hawk-Dove game, the ratio V C might be the most important aspect regarding the expected payoffs (if V < C the ESS of the Hawk-Dove game is simply play Hawk with probability p = V C ), as opposed to specific values of V and C. This means that if we know the ESS for small values of V and C, we can also calculate the ESS for αV and αC. The following holds where θ x is the strategy for individual x. If we multiply V and C by α, we obtain: where θ x is the strategy of individual x when V and C become αV and αC, respectively. Now from equations (28) and (29) we obtain This means that if for a sequence of wins and losses individual x retreats following strategy θ x , it will retreat for the same sequence following strategy θ x = θ α x when V and C are exchanged for αV and αC, respectively (assuming that changing the value of V using α in this way does not affect the choice of k x ). Thus, if only the ratio V C matters for finding the ESS and θ x is the ESS for V and C, then θ x will be the ESS for αV and αC. We illustrate this point with an example. We assume the parameter constellation N = 2, V = 0.02, C = 0.04 and α = 3 2 and use the simulation algorithm given in Sect. 4 to determine the ESS. We obtain θ x = 0.91 (corresponding to k x = 2) as the ESS for V = 0.02, C = 0.04 and θ x = 0.87 (corresponding to k x = 2) for αV = 0.03 and αC = 0.06. When we use formula (30) and take θ x = 0.91 as the ESS baseline (V = 0.02, C = 0.04), we obtain θ x = 0.91 3 2 = 0.868 as the new ESS which is close to the 0.87 value that we get from the simulations. Thus, the results from these simulations support formula (30). We have also analysed different values of α = 2, 1 2 , 1 5 , 5 and we obtain ESS corresponding to k x = 2 for all the cases. We can conclude that equation (30) gives a good approximation for the ESS. This is always true when we have small values of V and C; however, there are some cases when it works less well, principally where V, C (or α which will lead to large V or C in the comparative model) is large. We note that the larger V and C, and the bigger T , the more unrealistic multiplying the RHP by a constant after every contest is. On the other hand the smaller T , there are more times when we cannot distinguish between a pair of individuals as neither of them has retreated. Thus, a realistic model should only contain relatively small V and C.

Discussion
In this paper, we have introduced game-theoretical elements to the winner-loser model developed in Dugatkin (Dugatkin 1997;Dugatkin and Dugatkin 2007). We considered a group of individuals that are characterised by their fighting ability score (their RHP) and a strategy θ that indicates whether an individual would engage in an aggressive interaction or retreat. All individuals were assumed to possess the same RHP initially. We have developed a model that determines the expected payoff and ESS for different group sizes and payoffs, involving V and C, in such a population.
In the first part of this paper, we derived analytical results for a group of two individuals for the expected payoff and find the ESS, using ln(R H P) as the payoff function, which correspond to situations with unlimited resources. In order to calculate the expected payoff for individual x with strategy θ x , we first found the condition when this individual would retreat, represented by k. The variable k describes the critical difference between the number of wins and losses, below which individual x retreats. Given that a win increases the value of RHP, the value of k corresponds to the difference in RHP and thus only the individuals with a high RHP relative to its opponent risk engaging in an agonistic interaction to obtain more access to the available resources. We showed that there is a range of strategies θ x that correspond to the same value of k, meaning that they will give the same payoff. Furthermore different combinations of V and C yield different ranges of θ x for any given value of k.
We illustrated this analytical part with an example where we assumed V = C = 0.1. We found the expected payoff for different strategies θ ≥ 0. In this case, we obtained a pure ESS which was achieved for k = 3, corresponding to the θ range [0.55, 0.67]. Any strategy from this range gives the same payoff and is an ESS. We next varied V and C and saw the effect of this variations on the expected payoff and the ESS. As expected, if V is increased for a fixed C, the individuals will fight more, corresponding to lower values of θ . On the other hand, if C is increased for a fixed V , we get bigger values of θ as an ESS. This means that individuals will fight less as C is increased.
We also used the idea of the reproductive skew (Broom et al. 2009;Keller and Reeve 1994;Reeve and Keller 2001;Shen and Reeve 2010;Vehrencamp 1983) to study how scarce resources are divided between a pair of individuals by using an alternative payoff function given in equation (27). When comparing the results with the ones obtained for the original payoff function, we observe smaller values of θ as an ESS. This means that in this case individuals need to be more aggressive in order to obtain a larger share of the available resources. While in our model, and in those of Dugatkin (1997) and Dugatkin and Dugatkin (2007), linear hierarchies are generally formed efficiently when (i) winner and loser effects are both present, (ii) only the winner effect or (iii) only the loser effect is present, the three models give clearly distinct predictions. With only the winner effect present, individuals in our model (for optimal strategy choice) and that of Dugatkin (1997) will continue fighting indefinitely, whereas in Dugatkin and Dugatkin (2007) individuals start fighting, but eventually contests cease. With only the loser effect present, individuals would give up immediately in our model (at least for the plentiful resources case defined by payoff function (11)), would give up after the first loss in the model of Dugatkin (1997), and would fight for some longer period in the model of Dugatkin and Dugatkin (2007). These differences in the results of the three models are rooted in the modelling assumptions. In Dugatkin (1997), there is no strategic choice and individuals do not know their opponent's RHP; in Dugatkin and Dugatkin (2007), there is no strategic choice, but they do know their opponent's RHP, and in our model, there is strategic choice and their opponent's RHP is known. Thus, Dugatkin and Dugatkin (2007) can be thought of as an intermediate model between the other two. However, the predictions of our model are closer to that of Dugatkin (1997) than Dugatkin and Dugatkin (2007) and we would argue that these are more realistic.
Other authors have considered alternative game-theoretical models of dominance hierarchy formation. A good recent survey which raises some interesting questions and suggestions for further modelling is Mesterton-Gibbons et al. (2016). We shall discuss two such models. Van Doorn and co-workers  analysed the evolution of dominance hierarchies by assuming that individuals are identical in ability throughout the time of their interaction, and so while their strategic choices depend upon past results, the actual probability of winning a contest depends upon the strategic choices of individuals, rather than their actual abilities. This is an example of what Maynard Smith Maynard Smith (1982) called an uncorrelated asymmetry (as opposed to a correlated asymmetry, as in our model). They found several evolutionary equilibria, one of them was the "dominance" equilibrium with the winner and loser effect where previous winners were more likely to take part in aggressive interactions and previous losers less likely to be aggressive. He also found a paradoxical equilibria where the higher position was occupied by the loser of an aggressive interaction than the winner. These results are very similar to the owner-intruder game Maynard Smith (1982) where paradoxical convention-based outcomes can occur. They then extended this model to larger group sizes , where the individuals still had limited information about previous fights. Similar as in the two-player model, several evolutionary equilibria were found, one being with the winner and loser effect. The assumptions and outcomes are thus rather different to our model. Fawcett and Johnstone (2010) developed a model to analyse the level of aggression where each individual differed in strength, but where they had no information about this difference. They predicted that the level of aggression is related to the amount of information that an individual has about prior contests. While the young individuals should be more aggressive as they are not sure about their fighting ability, the older one are not. They have knowledge of prior experience, and they retreat after a series of losses. Although the mechanisms differ, the actual way that the populations evolve is quite similar to ours. In their model, there are real differences between individuals, but the individuals start with no knowledge and learn over time; in our model, individuals have varying probabilities of being able to win a contest, which change (perhaps due to psychological factors) over time. In each case, after a time it is clear which individuals are the better ones, and the level of aggressive interactions declines, as more individuals play the more passive strategy. We note that in their model, the eventual division into mainly aggressive strong individuals and mainly passive weak individuals is dependent upon an intermediate number of strong/ weak individuals and that this divide would not happen for all population divisions.
In each of the strategic models discussed above (Fawcett and Johnstone 2010;, in addition to ours), individuals face a potentially long sequence of contests where they have two options at each step. Thus, in the same way as in games such as the classical iterated prisoner's dilemma Axelrod (1984), there is a vast array of potential strategies. Each model reduces the dimensions of this strategy space in different ways . In the models of , individuals were constrained to have a memory only of the latest interaction with an individual and so could base their play only on the results of this latest interaction (from the iterated prisoner's dilemma "tit for tat" is such a strategy). Fawcett and Johnstone (2010) allow individuals to know their performance from all past contests, but allow them only to condition play on the total number of contests encountered, together with the number of wins in these contests. Our model behaves in a similar way to that of Fawcett and Johnstone (2010), basing strategy on the RHP, which in turn depends directly upon the number of won and lost contests of the participating individuals.
Similar results to those from our model concerning aggression levels have been found in experimental settings. Kotrschal et al. Kotrschal et al. (1993) performed a feeding experiment with greylag geese. Grained food was given in high, medium and low density. The geese were fed twice daily, and the level of aggression was recorded. They found a low number of agonistic interactions in the high food density setting and an increase in those aggressive interactions when the food density was decreased. Nie et al. Nie et al. (2013) conducted feeding experiments with varying levels of predation with root voles. They considered four treatments by combining different levels of predation and food supply (i.e. (no predation, food), (predation, food), (predation, no food), (no predation, no food)). They observed higher levels of aggressiveness in the groups treated with unfavourable conditions such as (predation, no food) compared to groups treated with (no predation, food). When the groups were treated with (predation, food) and (no predation, no food), the level of aggression observed was intermediate. These findings support our results that if resources are scarce, then an individual needs to be more aggressive.
An important concept related to the expected payoff is that of the stopping time. The stopping time is defined as the first time when one of the two individuals hits its stopping value of k. It gives a guideline for how many agonistic interactions we need to observe in a pair of individuals before one retreats. After hitting the stopping time, an individual would then always retreat afterwards. We showed in our example that twenty possible interactions is enough for an individual to retreat in almost all cases. Note that if T max is relatively larger than the stopping time, the continued increase in the winner's RHP after the stopping time is unrealistic. If, however, T max is smaller than the stopping time, it is more difficult to distinguish between a pair of individuals in terms of their ranks in the hierarchy.
Analytical results can be derived for a group of two individuals, but for larger group sizes those derivations become effectively intractable. To explore the behaviour of larger group sizes, in particular to find the ESS, we developed in the second part of the paper a simulation approach. Analysing a group of four individuals, we found that the value of the ESS is increased when V is increased (for a fixed C), and by contrast, the value of the ESS is decreased when C is increased (for a fixed V ). Comparing the values of ESS for a group of two individuals with the ones obtained for a group of four individuals leads to the conclusion that individuals should be less aggressive (i.e. fight less) in larger groups.
While this result is commonly observed in behavioural experiments, there are experimental settings leading to contradictory conclusions. For example, Nicol et al. Nicol et al. (1999) conducted a feeding experiment with Isa brown birds. They analysed the behaviour of the birds in groups of four different sizes (72, 168, 264 and 368). The birds were fed twice a day, and the number of aggressive pecking interactions were recorded. The results suggested a higher level of aggression in the smallest group (72) compared to the larger groups (168,264,368). Further, Anderson et al. Andersen et al. (2004) compared their model predictions (larger group sizes result in lower aggression levels) with results from an experiment with crossbred pigs. They considered three groups of 6, 12 and 24 pigs (which had not interacted with each other previously) which were put into pens and the space per individual was kept the same. There was one feeder per six pigs, and they were fed on 'Format Start' every morning. The aggressive interactions in each group were then recorded. It was observed that the level of aggression decreased with increasing group size. This result was also supported by further experiments Estevez et al. (2007), Estévez et al. (1997), Syarifuddin and Kramer (1996) Turner et al. (2001). However, Bilvci et al. Bilčık and Keeling (2000) observed the aggressive behaviour in a feeding experiment with groups of 15, 30, 60 and 120 Hisex white hens and noticed higher level of aggression in larger groups of birds than in the smaller ones.
Summarising, we presented a game-theoretical model which determines the evolutionarily stable aggression level in a populations of N individuals and different payoff functions, involving V and C, within a winner-loser framework. Within a group, we found that the population evolves to a unique aggression threshold, indicating that relative to their strength, all individuals adopt the same decision rule against whom to fight. Typically, the hierarchy is established quickly, with aggressive fights happening only in the early contests. Applied to real-world situations, this points to the crucial importance of the first few fights for hierarchy formation. Later fights only determine the position of lower-ranked individuals. While higher values of C for losing an aggressive interaction (keeping the value of V constant) lead to lower aggression levels in the population, the reverse is true for increasing the value V for winning an aggressive interaction (keeping C constant): the higher the value of V , the higher is the aggression level in the population. Further, we predict lower aggression levels in larger populations. Our results are largely supported by experimental evidence so that we conclude that the introduction of game-theoretical elements to winner-loser models provides a further step towards a realistic description of aggressive interactions.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.