Where Do Mistakes Lead? A Survey of Games with Incompetent Players

Mathematical models often aim to describe a complicated mechanism in a cohesive and simple manner. However, reaching perfect balance between being simple enough or overly simplistic is a challenging task. Frequently, game-theoretic models have an underlying assumption that players, whenever they choose to execute a specific action, do so perfectly. In fact, it is rare that action execution perfectly coincides with intentions of individuals, giving rise to behavioural mistakes. The concept of incompetence of players was suggested to address this issue in game-theoretic settings. Under the assumption of incompetence, players have non-zero probabilities of executing a different strategy from the one they chose, leading to stochastic outcomes of the interactions. In this article, we survey results related to the concept of incompetence in classic as well as evolutionary game theory and provide several new results. We also suggest future extensions of the model and argue why it is important to take into account behavioural mistakes when analysing interactions among players in both economic and biological settings.


Introduction
In classical non-cooperative games, the payoffs (or outcomes) are determined directly by the players' choice of strategies. In reality, however, a player may not be capable of executing his or her chosen strategy due to a lack of skill that we shall refer to as incompetence.
In this paper, we survey a relatively recent line of research that is predicated on the assumption that player incompetence is a real and ubiquitous phenomenon that deserves deeper investigation. In the process, we also present a few recent results which, to the best of our knowledge, have not been reported elsewhere. Since we regard the topic of incompetent games as still being in its infancy, the main objective of this paper is to stimulate further research on this subject.
Naturally, players' incompetence inherently complicates a game. To prevent the added complexity from becoming unmanageable, we must impose some assumptions on the information domain and the structural form via which incompetence manifests itself. In the development so far, the following key conceptual assumption has been imposed: [A1] Incompetence manifests itself as a set of probability distributions on sets of actions available to one or more players.
While [A1] is restrictive in some ways, it allows us to recover a classical competent game as a special case of an incompetent game. Namely, a game where all of the above incompetence distributions are degenerate and execute the selected actions with certainty.
This immediately raises the possibility of parametrising the level of competence or, equivalently, level of skill of a player by the 'proximity' to such a, fully competent, degenerate distribution. It also opens up the possibility of players 'learning' by reducing their levels of incompetence (equivalently, increasing their skill). We note that this kind of learning is essentially different from both the discovery statistical learning and the imitation machine learning.
To date, the topic of incompetent games has evolved along two distinct, but conceptually related, directions. The first of these is the study of classical non-cooperative games under the assumption that at least one player is incompetent. The second is the study of evolutionary incompetent games.
In the case of classical games under incompetence, existing analyses assumed that all probability distribution capturing incompetence are mutually known by all players. This is plausible in the case of players who are familiar with others' past performance (e.g. tennis players on the international tour circuit). However, there is certainly scope for relaxing that assumption. Such relaxations may give rise to interesting repeated versions of these games and the natural trade-off between the so-called problem of "exploration versus exploitation".
The preceding "mutually known" assumption is not explicitly needed in the evolutionary incompetent games. It is also conceptually challenging to ascribe consciousness of such distributions to individual animals or bacteria. Nonetheless, within the evolutionary paradigm, it is reasonable to assume that the emerging equilibrium frequencies of species types have incorporated the mutual incompetence uncertainties in their adaptation to the ecosystem. The uncertainties stemming from the incompetence are thus simply built into the replicator dynamics of the game.
This review paper is structured as follows. We introduce incompetence first in classical non-cooperative games and then in evolutionary games. In Sect. 2, specifically in Sects. 2.1-2.4, we provide an overview of the formal definitions and results on games with incompetent players in classical settings. Then, in Sect. 2.5 we introduce a new concept of incremental learning in the games with incompetent players and derive some of its properties. In Sect. 3, we define evolutionary games with incompetence and provide an overview of results on these games. Additionally, we provide a rationale for the importance of considering these games in biological settings. We conclude by suggesting possible directions for future extensions of games with incompetence.

Incompetent Classical Non-cooperative Games
Chronologically, Beck and Filar [10] introduced incompetence to matrix games and Beck et al. [9] introduced incompetence to bimatrix games. Essentially, by quantifying a player's tendency to accidentally deviate from their selected actions, these authors represent incompetence as stochastic matrices that can be used to account for these deviations. The application of incompetent classical games to military planning is discussed in [8] and [7].
We note that the notion of incompetence introduced here is superficially similar to several concepts used to measure the sensitivity of equilibria to changes in a game's parameters. For example, Selten [63] imagines players having "trembling hands" that cause them to accidentally execute unintended actions with negligible probability. This is used to refine the equilibrium solution concept in extensive-form games by defining trembling hand perfect equilibria. However, unlike the notion of incompetence, a trembling hand is not intended to model players making mistakes with arbitrary or even prescribed probabilities (e.g. a tennis player who routinely places a 'passing shot' in the opponent's hitting zone).
While the concepts and some of the results are generalisable to N -person non-cooperative games, the setting of classic two-player games-meaning matrix and bimatrix games-is a natural starting point for the introduction of incompetence to game-theoretic models. Larkey et al. [45], when discussing the game "Sum Poker", observe that there are several cognitive and physical limitations that might prevent a player from finding or implementing optimal strategies. Then, seeking to classify the specific difficulties a player encounters, they propose a typology of skills consisting of: -Strategic Skill, the ability to select which games should be played, -Planning Skill, the ability to develop a desirable strategy within a game, and -Execution Skill, the ability to execute desired actions throughout a game. Although Larkey et al. [45] apply this typology to experimentally compare different strategies in "Sum Poker" under different skill limitations, a precise mathematical formulation of skill is not provided. The notion of incompetence mainly addresses the issue of execution skill as it quantifies accidental (ipso facto, unintended) deviations from a player's chosen strategy.
The concept of incompetence is a useful modelling tool in traditional game-theoretic settings because it captures a player's inability to precisely control the outcome of their actions. Necessarily, real-world strategic interactions are often complicated and a player's intentions might not be perfectly realised due to noise from their environment. For instance, a tennis player is unable to control the exact trajectory of a shot and might sometimes make consequential mistakes. Similarly, in the economic context, it is conceivable that a firm in the classic Cournot oligopoly model is unable to guarantee the quantity of goods produced, perhaps due to sporadic errors occurring during production. The latter may reflect flaws in the firm's quality control regime which in itself is related to its level of competence 1 . The players in these situations must accept some degree of variability in the outcomes of their actions and incompetence is a method capable of accounting for this variability.
In this paper, in the context of classical games, we will review the introduction of incompetence and present several related, unpublished, results. Moreover, we will discuss and solve a simple model for incrementally learning to decrease incompetence during a repeatedlyplayed incompetent matrix game.

Incompetent Evolutionary Games
Game theory applied to populations of species has branched out from non-cooperative game theory and evolved into an independent field called evolutionary game theory [28,56,68]. The setup of evolutionary games differs from the classical games in the very basic assumption rationality of players. Obviously, one cannot expect rational behaviour from individual bacteria or lions. However, the selection strength still acts rationally, which follows from the classic prediction that the fittest survives [69]. Hence, in evolutionary games players, or animals, do not make conscious choice of strategies to play as humans in economic settings. Instead, individuals are born with a strategy predefined by their parents. The competition then happens at a genetic level, which encodes the strategy. Incidentally, the question of how cooperation evolves in biology is still open and, possibly, related to the coexistence of different strategies at equilibrium. We note that in a recent paper [2] thermodynamics was invoked to shed light on cooperation in evolutionary games.
The concept of incompetence was considered in biological settings when studying the evolution of social behaviour [36][37][38][39][40]. That is, incompetence now acts at the selection level rather than organisms themselves. Then, incompetence can be seen as behavioural plasticity leading to mixed strategies executed at a genetic level. As a result, we do not require organisms to be aware of probability distributions of all genes as well as their mistakes. Instead, selection forces act upon these distributions driving the competition among types. Such plasticity can be of variable degree, depending on the environmental conditions and adaptations of organisms. The latter will correspond to the level of incompetence discussed above.
Naturally, the idea of behavioural plasticity or stochasticity is not novel in the field of evolutionary games and recently became one of the foci in the field. There are many approaches considered in biological settings such as genetic mutations [12,57,70,72], learning processes [23,29,41,42,51,57,64], adaptation dynamics [47], phenotypic plasticity [15], noise in continuous and discrete-time replicator dynamics [4,6,19,22] and environmental fluctuations [75,81]. Thus, the notion of incompetence of players merely fills a new niche where behavioural stochasticity is only induced at the moment of interactions.
Let us demonstrate this concept on a well-studied example of a Hawk-Dove game [67]. Imagine that individuals in a well-mixed large population compete for some resource. Two behavioural strategies are available in a population: a Hawk (aggressive) strategy and a Dove (passive) strategy. That is, Hawks fight for the resource, while Doves prefer to share equally and flee when attacked. This game has a payoff matrix of the same structure as a Chicken or Snowdrift games. As in these classical examples, in an equilibrium, both strategies stably coexist. The game can be illustrated as an interaction between a naturally aggressive person and a naturally passive one. Of course, a counter-attack in response to aggression is not something one naturally expects from a passive person. However, behavioural plasticity induced by incompetence can lead to situations when a passive player responses aggressively or an aggressive player flees instead of fighting. While their strategies did not change as such, the behaviour exhibited by individuals was altered. In [37], it was shown that such an assumption may lead to different evolutionary outcomes and may change the way selection is realised. In this survey, we discuss these results and demonstrate them on an example of a biological game.

Games without Incompetence
An m × n bimatrix game Γ consists of a pair of action sets A = {1, 2, . . . , m} and B = {1, 2, . . . , n} and a pair of reward matrices R 1 = (r 1 (i, j)) ∈ R m×n and R 2 = (r 2 (i, j)) ∈ R m×n . Here, throughout this section only, we use non-standard notation for matrix entries to accommodate additional symbols associated with incompetent games. After an action i ∈ A is selected by Player 1 and an action j ∈ B is selected by Player 2, they receive rewards according to the matrix entries r 1 (i, j) and r 2 (i, j), respectively. A (mixed) strategy extends this behaviour to allow for randomised action selection. Specifically, Player 1 chooses a strategy from the probability simplex over A and Player 2 chooses a strategy from the probability simplex over B. The resulting strategy profile (x, y) ∈ X × Y yields an expected reward of v k (x, y) := xR k y T to Player k ∈ {1, 2} where y T is the transpose of y. We want to find the (Nash) equilibria of Γ , which capture the notion of a stable strategy profile. Precisely, (x * , y * ) ∈ X × Y is an equilibrium whenever it is resilient to unilateral deviations or, equivalently, for all x ∈ X and y ∈ Y. Nash [54], in a seminal contribution to game theory, proves that every game with finitely-many players and actions has an equilibrium point. Vorob'ev [79] shows that, in bimatrix games, the set of equilibria is the union of a finite collection of convex sets. Specifically, these are called maximal Nash subsets and are the largest equilibrium-containing sets that are closed under interchanging a player's strategies. The extreme points of a maximal Nash subset are called extreme equilibria and are associated with paired non-singular square submatrices of R 1 and R 2 called kernels (see Kuhn [43]). A bimatrix game satisfying the zero-sum property: r 1 (i, j) = −r 2 (i, j) for all i = 1, 2, . . . , m and j = 1, 2, . . . , n, is called a matrix game and is described by the single matrix R := R 1 = −R 2 . Note that, when Γ is a matrix game, every equilibrium (x * , y * ) ∈ X × Y achieves the same reward val(Γ ) := v 1 (x * , y * ) = −v 2 (x * , y * ), which is called the value of the game Γ . Moreover, in recognition of von Neumann's [55] the component strategies x * and y * of an equilibrium are often called (minimax) optimal strategies.
A completely mixed equilibrium (x * , y * ) ∈ X × Y of a bimatrix game Γ is an equilibrium under which every action is played with non-zero probability. If Γ has only completely mixed equilibria, then it is called a completely mixed game and has only a single (completely mixed) equilibrium (see Kaplansky [35] and Raghavan [62]). Additionally, if Γ has a maximal Nash subset containing only completely mixed equilibria, then it is called a weakly completely mixed game. Jurg et al. [34] prove that a weakly completely mixed game contains a unique completely mixed equilibrium.

Games with Incompetence
Beck et al. [9] introduce incompetence to a bimatrix game Γ by allowing players to accidentally deviate from their intended actions. Specifically, after a pair of actions is selected from A × B, incompetence randomly determines an executed action profile-also from A × Baccording to a predefined probability distribution. This distribution is represented by the incompetence matrices Q 1 := (q 1 (i, i )) ∈ R m×m and Q 2 := (q 2 ( j, j )) ∈ R n×n . After Player 1 and Player 2 select the actions i ∈ A and j ∈ B, they execute the actions i ∈ A and j ∈ B with probability q 1 (i, i ) and q 2 ( j, j ), respectively. The notation Γ Q 1 Q 2 denotes the game Γ played under incompetence. If the incompetence matrices Q 1 and Q 2 are unambiguous, we will often replace the subscript "Q 1 Q 2 " with simply "Q" (e.g.
Note that the original incompetence framework described by Beck et al. [9] allows the players' sets of selectable and executable actions to differ. This is especially useful for modelling actions that can be executed with variable quality. Beck and Filar [10] give an example of a capability acquisition game in which a defender, after selecting the action "Conventional Defence", may execute either "Good Conventional Defence" or "Bad Conventional Defence". Here, for the sake of notational simplicity, we assume that a player's sets of selectable and executable actions coincide.
Suppose that Player 1 selects the action i ∈ A and Player 2 selects the action j ∈ B. The expected reward received by Player k ∈ {1, 2} under incompetence is Hence, Γ Q can be treated as another bimatrix game with the incompetent reward matrix R k Q := (r k Q (i, j)) ∈ R m×n belonging to Player k ∈ {1, 2}. Clearly, we have Note that, as an immediate consequence of (7), an incompetent game derived from a matrix game is also a matrix game. The expected reward granted to Player for each (x, y) ∈ X × Y.
Beck et al. [9] are not only interested in games with static incompetence, but also dynamic games wherein players are able to vary their incompetence. This is captured by a pair of learning trajectories Q 1 : [0, 1] → R m×m and Q 2 : [0, 1] → R n×n . Then, for each pair of learning parameters λ, μ ∈ [0, 1], the corresponding incompetent game Γ Q (λ, μ) has Player 1's incompetence matrix defined as Q 1 (λ) and Player 2's incompetence matrix defined as Q 2 (μ). Equivalently, Γ Q (λ, μ) is the bimatrix game with reward matrices for each Player k ∈ {1, 2}. Although the family of parameterised incompetent games Γ Q (λ, μ) is interesting in its own right (see Sect. 2.4), it is also an essential building-block used to construct dynamic learning games (see Sect. 2.5).
Example (attack-defence game with incompetence) Consider, as an example, a matrix game Γ played between two aeroplane pilots-labelled "Attacker" (Player 1) and "Defender" (Player 2)-competing over three sites. The attacker wants to destroy a site and the defender wants to prevent this from occurring. Precisely, we have the action sets A = {1, 2, 3} and B = {1, 2, 3} where, for each i, j ∈ {1, 2, 3}, Player 1's action i means "Attack Site i" and Player 2's action j means "Defend Site j". A successful attack occurs if and only if the defending pilot does not anticipate the attacking pilot's destination or, equivalently, the executed action profile (i, j) ∈ A × B has i = j. The attacker receives a reward ν 1 = 3 when Site 1 is destroyed, ν 2 = 4 when Site 2 is destroyed, and ν 3 = 5 when Site 3 is destroyed. The corresponding utility matrix is where R = R 1 = −R 2 . The game value of Γ is 120 /47 ≈ 2.55 and its (unique) equilibrium has the attacking strategy x * = 1 /47( 20 15 12 ) and the defending strategy y * = 1 /47( 7 17 23 ).
We use incompetence to capture the pilots' navigation skills and their propensity to arrive at an incorrect site after getting lost. Define their learning trajectories Q 1 , Q 2 : [0, 1] → R 3×3 where, for each λ, μ ∈ [0, 1], we set where J n ∈ R n×n is the n ×n all-ones matrix and I n ∈ R n×n is the n ×n identity matrix. Note that Q k = 1 /n J n is called uniform incompetence and Q k = I n is called complete competence respectively. The game value of Γ Q (0, 0) is 8 /3 and the game value of Γ Q ( 1 /2, 1 /2) is 835 /324 ≈ 2.58. Moreover, since complete competence is achieved at λ = μ = 1 and Γ Q (1, 1) = Γ , we already know that the game value of Γ Q (1, 1) is 120 /47 ≈ 2.55. Thus, it appears that the parameterised incompetent games Γ Q (λ, μ) move in the defender's favour as the learning parameters are increased along λ = μ. A clearer picture of the game value's dependence on these learning parameters is achieved in Fig. 1 by plotting the function (λ, μ) → val(Γ Q (λ, μ)) on the domain [0, 1] × [0, 1]. Note that, in this specific example, (λ, μ) → val(Γ Q (λ, μ)) is piecewise linear and nondecreasing (or non-increasing) in the variable λ (or μ). This means that learning is beneficial for the attacking and defending player when their opponent's incompetence remains fixed. Furthermore, the game value plateaus on the region [ 11 /47, 1] × [ 26 /47, 1] indicating that the attacker reaches their "maximum useful skill' at λ = 11 /47 ≈ 0.23 and the defender reaches their "maximum useful skill" at μ = 26 /47 ≈ 0.55. Interestingly, Γ Q (λ, μ) is also completely mixed on ( 11 /47, 1] × ( 26 /47, 1], which suggests a connection between complete mixedness and this game value plateau. We will further explore the general properties of parameterised incompetent games in Sect. 2.4.

Executable Strategies
Although [9] and [10] view incompetence as modifying a player's reward matrix, it is also possible to view it as modifying their strategy spaces. Here, we return to the setting of a static incompetent game Γ Q with incompetence matrices Q 1 ∈ R m×m and Q 2 ∈ R n×n . Note that, after Player 1 selects a strategy x ∈ X (or Player 2 selects a strategy y ∈ Y), the resulting executed strategy is xQ 1 (or yQ 2 ) after incompetence is included. What strategies are the players able to execute? Well, Player 1 and Player 2 are able to execute the strategies in respectively. We call E k (Q k ) the executable strategy space belonging to Player k ∈ {1, 2} and, when the incompetence matrices are unambiguous, we simply write E k instead. Importantly, from the perspective of an outside observer who only sees that players have executed strategies from E 1 and E 2 , we would be unable to distinguish whether they are playing the competent game Γ or the incompetent game Γ Q . Figure 2 shows some of the executable strategy spaces within the previously discussed attack-defence game with incompetence. Note that the transition between executable strategy spaces can be more complicated than the "growing" seen in Fig. 2.
What is the connection between equilibria of Γ Q in X × Y and equilibria of Γ in E 1 × E 2 ? Theorem 1 gives conditions under which an equilibrium in the competent game Γ can be converted into an equilibrium in the incompetent game Γ Q , and vice versa. Theorem 1(i) implies that an equilibrium of Γ in E 1 × E 2 is always executed by an equilibrium of Γ Q . Meanwhile, Theorem 1(ii) implies that there exists an equilibrium of Γ in the interior of E 1 × E 2 provided that Γ Q is weakly completely mixed.

Lemma 1
If Γ Q is a weakly completely mixed incompetent bimatrix game, then its incompetence matrices Q 1 and Q 2 are non-singular.
for some non-zero coefficients θ i ∈ R with i ∈ I . Then, after right-multiplying by the all-ones row vector 1 T m ∈ R m , we have i∈I θ i = 0. Certainly, since the entries of Q 1 are nonnegative, we can partition the index the set I into non-empty subsets We construct an alternative strategy x † ∈ X, which is also completely mixed, where Observe that so x * and x † result in identical expected rewards to Player 1 and Player 2 in Γ Q and (x † , y * ) is an equilibrium of Γ Q . But, given that Γ Q contains two distinct completely mixed equilibria (x * , y * ) and (x † , y * ), it cannot be a weakly completely mixed game. After using a similar argument for the other incompetence matrix Q 2 , the desired result follows by contraposition. Then, is a (completely mixed) equilibrium in Γ whenever Γ Q is weakly completely mixed and (x * , y * ) is a completely mixed equilibrium in Γ Q .

Proof (i) Assume that Player 1 possesses a profitable deviation from
After repeating a similar argument for Player 2, we obtain (ii) We know that Player 1's strategy x * makes Player 2 indifferent between their actions in Γ Q , hence where 1 n ∈ R 1×n is an all-ones row vector. Clearly, this is solved when x * Q 1 R 2 = v 2 (x * , y * )1 T n and this solution is unique as Q 2 is non-singular (by Lemma 1). This shows that x * Q 1 makes Player 2 indifferent between their actions in Γ and, by a similar argument, y * Q 2 makes Player 1 indifferent between their actions in Γ . Note that x * Q 1 and y * Q 2 are both completely mixed because the entries in x * and y * are strictly positive and (by nonsingularity) the columns of Q 1 and Q 2 cannot contain only zeros. Thus, appealing to the indifference principle, we conclude that (x * Q 1 , y * Q 2 ) is an equilibrium in Γ .
Corollary 1, which states that a completely mixed matrix game achieves the same game value as its competent counterpart, was originally presented by Beck and Filar [10]. Although they give a utility-centred argument based on Shapley and Snow's [66] game value formula, we give an alternative strategy-centred argument based on Theorem 1(ii). Note that, a generalisation of this result to bimatrix games is presented in [9]; however, we still choose to highlight the matrix game version for later discussion.
Corollary 1 [10] If Γ Q is a completely mixed incompetent matrix game, then Γ is a matrix game and val(Γ ) = val(Γ Q ).
Proof Certainly, Γ is also a matrix game because Lastly, the result in Theorem 1(ii) can be extended to incompetent bimatrix games that are "almost" weakly completely mixed. Theorem 2 shows that, if Γ Q can be approximated by a sequence of weakly completely mixed incompetent games, then Γ has an equilibrium in E 1 × E 2 .

Lemma 2 If Γ Q is a weakly completely mixed incompetent bimatrix game, then Γ is also weakly completely mixed.
Proof We know from Theorem 1(ii) that there exists a completely mixed equilibrium (x * , y * ) ∈ E 1 × E 2 of Γ , so it lies in the interior of E 1 × E 2 . If Γ is not weakly completely mixed, then there exists another equilibrium (x † , y † ) ∈ X × Y such that (x * , y * ) and (x † , y † ) belong to the same maximal Nash subset. Define the convex combination (x α , y α ) of these strategy profiles by 1]. Note that, because Nash subsets are closed under convex combinations, (x α , y α ) is an equilibrium of Γ for every α ∈ [0, 1]. Moreover, for some α * ∈ (0, 1], the strategy profile (x α , y α ) lies in the interior of E 1 × E 2 for every α ∈ [0, α * ). But, by Theorem 1(i) and Lemma 1, this means that ( Given that this contradicts the uniqueness of a completely mixed equilibrium in the weakly completely mixed game Γ Q , we conclude that Γ must also be weakly completely mixed. Proof Take the sequences of strategies {x * } ∞ =1 ⊂ X and {y * } ∞ =1 ⊂ Y such that, for each = 1, 2, . . ., the strategy profile (x * , y * ) is the unique completely mixed equilibrium of Γ Q . Moreover, let (x † , y † ) ∈ E 1 × E 2 be the unique completely mixed equilibrium of Γ , which we know to be weakly completely mixed by Lemma 2. Then, applying Theorem 1, we have x * Q 1 = x † and y * Q 2 = y † for each = 1, 2, . . ..
Note that, because the strategy spaces X and Y are compact, there exists subsequences {x * s } ∞ s=1 and {y * t } ∞ t=1 that converge to some strategies x * ∈ X and y * ∈ Y, respectively. Clearly, This shows that (x * Q 1 , y * Q 2 ) is a completely mixed equilibrium of Γ and, by Theorem 1(i), (x * , y * ) is an equilibrium of Γ Q , as required.

Variational Properties
Now, we return to the dynamic setting where Γ Q (λ, μ) denotes a family of incompetent games parameterised by a pair of learning trajectories. A central focus in the development of incompetence has been the variational properties of Γ Q (λ, μ) when Γ is a matrix game or a bimatrix game. Here, we will summarise what is known about the behaviour of these incompetent games under variations in the players' learning parameters.
Beck et al. [9] study the dependence of equilibrium-induced expected rewards on the players' learning parameters. They present Theorem 3 and Theorem 4 showing that, under certain conditions on Q 1 (λ) and Q 2 (μ), the expected rewards granted by a specific extreme equilibrium have useful representations. Theorem 3 [9] Assume that Q 1 (λ) and Q 2 (μ) are linear, that is, for each (λ, μ) ∈ Λ × M; a ratio of bivariate polynomials in λ and μ.
Theorem 4 [9] Assume that Q 1 (λ) and Q 2 (μ) are linear with initially uniform incompetence Then, the dependence of an extreme equilibrium's expected reward in (15) is (at most) linear in λ (or μ).
Furthermore, in addition to proving specialisations of Theorem 3 and Theorem 4 for matrix games, Beck and Filar [10] establish several other properties regarding the game value of a parameterised incompetent matrix game Γ Q (λ, μ). Specifically, they prove that the function (λ, μ) → val(Γ Q (λ, μ)) is continuous and not-necessarily monotone in λ and μ. It is also shown that a player can never achieve a greater reward than under complete competence; that is, for all λ, μ ∈ [0, 1]. Beck and Filar [10] also briefly address the plateauing game values of some parameterised incompetent matrix games (see, for example, Fig. 1) by noting that Corollary 1 might apply when a player approaches complete competence. The tools developed in Sect. 2.3 allow us to further explore this observation. Consider the set of learning parameters on which Γ Q (λ, μ) is completely mixed. Assume that the learning trajectories Q 1 (λ) and Q 2 (μ) are continuous. Then, given that the set of reward matrices belonging to completely mixed matrix games is open (see Jansen [33]), the set C is also open. Theorem 2 shows that, for each (λ, μ) ∈ C, the players are both able to execute a completely competent optimal strategy in Γ Q (λ, μ). This means that, by an identical argument to Corollary 1, the function Hence, we expect a game value plateau to emerge whenever Γ Q (λ, μ) becomes completely mixed.

Incremental Learning
Next, we will demonstrate a simple model of incremental learning in a parameterised family of incompetent matrix games Γ Q (λ, μ). This incremental learning game Γ inc is a stochastic game unfolding over an infinite time horizon T = {0, 1, 2, . . .} in which, between repeated plays of an incompetent game, the players may choose to increment their learning parameters through the ordered sets Λ := {λ 1 , λ 2 , . . . , λ M } and M := {μ 1 , μ 2 , . . . , μ N }. It is assumed that λ i < λ i+1 and μ j < μ j+1 for each i = 1, 2, . . . , M − 1 and j = 1, 2, . . . , N − 1. This means that a player's skill parameter can never be decreased or, informally, that a player can halt but never reverse the process of learning. Henceforth, we simplify notation by identifying i with λ i and j with μ j . Now, we give a precise description of Γ inc using the language and notation associated with stochastic games in [18]. The state space is chosen to index the learning parameters Λ× M. Fix a stage t ∈ T and a state s = (i, j) ∈ S such that λ i and μ j are the learning parameters belonging to Player 1 and Player 2 at stage t. Player 1 and Player 2 (optimally) play the incompetent game Γ Q (i, j) and are given the option to advance their learning parameters to i + 1 and j + 1, respectively. The decision to increment a learning parameter might incur a state-dependent learning cost c k (i, j) to Player k ∈ {1, 2}. Formally, we say that the actions belonging to Player 1 and Player 2 at state s are where "0" means "Don't Learn" and "1" means "Learn". If Player 1 selects a ∈ A(s) and Player 2 selects b ∈ B(s), then they receive the stage-t immediate rewards where the val(Γ Q (i, j)) term is the reward received after optimally playing Γ Q (i, j). Moreover, before the subsequent (t + 1) th stage, the game transition to the state (i + a, j + b) with (degenerate) transition probabilities given by for every s ∈ S. The general transition structure of this game is shown in Fig. 3.
Here, we will focus on stationary strategies, which are represented as block row vectors f = (f(s)) s∈S for Player 1 and g = (g(s)) s∈S for Player 2. The block f(s) = ( f (s, a)) a∈A(s) stores the probability f (s, a) of choosing action a ∈ A(s) and the block g(s) = (g(s, b)) b∈B(s) stores the probability of choosing action b ∈ B(s). The sets of stationary strategies belonging to Player 1 and Player 2 are denoted by F and G, respectively. The immediate rewards in (20) and the transition probabilities in (21) are extended to F × G by defining a)r k (s, a, b)g(s, b), for each (f, g) ∈ F × G. If the stochastic process {S t } ∞ t=0 stores the state at each stage t ∈ T , then it becomes a Markov chain under the dynamics induced by a strategy profile (f, g) ∈ F × G. We use P sfg and E sfg to denote probabilities and expectations under these dynamics with the initial state Then, (f * , g * ) ∈ F × G is a (Nash) equilibrium of the incremental learning game Γ inc whenever for all s ∈ S, f ∈ F, and g ∈ G.
Although Γ inc unfolds over an infinite time horizon, its transition structure admits a specialised backward induction algorithm for computing equilibria. We construct a suitable notion of "past" and "future" states by finding a sequence s 1 , It is straightforward to verify that a suitable ordering exists-for example, the lexicographical ordering. So, we shall assume that an ordering has been fixed and write instead of s . Lemma 3 shows that the discounted value of a strategy profile at a specific state does not depend on the "past" states. This allows us to restrict the stochastic game Γ inc to the limited state space { , + 1, . . . , L} while still being able to assess the value of strategies.  β p( | , f, g) .
Proof Observe that, by conditioning on the state S 1 after the first transition, the discounted value of (f, g) is ( , f, g).
Note that the above equality * = can be verified by applying the definition of r k ( , f, g) and appealing to the fact that {r k (S t , f, g)} ∞ t=0 is a Markov chain. Similarly, the equality * * = holds by applying the definition of v k ( , f, g). We now easily obtain (26) by rearranging to isolate the v k ( , f, g) term on the left-hand side.
The useful consequence of Theorem 5 is that, by solving a "local" problem at "previous" state , we can extend the equilibrium (f * +1 , g * +1 ) to create (f * , g * ). This local problem resembles a repeated game with absorbing states. Namely, if the players both choose to forego learning, then the game remains at state . Otherwise, if either of the players choose to learn, then the game transitions into a new state where the expected future rewards are fixed by (f * +1 , g * +1 ). The rewards given to Player k ∈ {1, 2} in this repeated game with absorbing states are for each (a, b) ∈ A( ) × B( ). An immediate consequence of Lemma 3 is that, for each k ∈ {1, 2} and (f , where satisfies the inequalities in (27), we need to solve the coupled pair of maximisation problems where p * 0 = 1− p * 0 = f * ( , 0) and q * 0 = 1−q * 1 = g * ( , 0). Under the additional assumption that this repeated game with absorbing states is non-degenerate, the solutions are either both pure strategies ( p * 0 , q * 0 ∈ {0, 1}) or both completely mixed strategies ( p * 0 , q * 0 ∈ (0, 1)). The pure strategy solutions can be found by imposing restriction p 0 , p * 0 , q 0 , q * 0 ∈ {0, 1} in (30); that is, by comparing the payoffs of every possible pure strategy profile. Moreover, by setting the appropriate partial derivatives (with respect to p 0 and q 0 , respectively) The solutions to (31) with p * 0 , q * 0 ∈ (0, 1) give the completely mixed strategy solutions to Γ inc . This shows that we are always able to extend (f * +1 , g * +1 ) to an equilibrium (f * , g * ) of Γ inc restricted to , . . . , L. Hence, since (f * L , g * L ) where f * (L, 0) = g * (L, 0) = 1 is the only strategy profile available at state L, we can work backwards through the states L − 1, L − 2, . . . , 2, 1 and repeatedly extend it until obtaining an equilibrium (f * , g * ) of Γ inc .

Example (attack-defence game with incremental learning)
Lastly, recalling the attackdefence game Γ Q (λ, μ) previously introduced in Sect. 2.2, suppose that the attacking and defending pilots have the option to undergo navigation training between engagements. We might model this as an incremental learning game Γ inc in which training allows the pilots to advance their skill parameters through Λ = 0, 1 /5, 2 /5, 3 /5, 4 /5, 1 and M = 0, 1 /5, 2 /5, 3 /5, 4 /5, 1 after paying learning costs of c 1 (i, j) = c 2 (i, j) = 1 /10 at state s = (i, j) ∈ S. Moreover, assume that the pilots have far-sighted discounted strategy valuations with a discount factor of β = 99 /100. What are the best strategies to reduce incompetence throughout this game?
The aforementioned backward induction algorithm produces a unique equilibrium of Γ inc shown graphically in Fig. 4. A node indicates a pair of learning parameters and an arc indicates a transition realised by the equilibrium. So, a vertical arrow means that only Player 1 learns, a horizontal arrow means that only Player 2 learns, a diagonal arrow means that both players learn, and a loop means that neither player learns.
Note that, under the equilibrium shown in Fig. 4 /47, 1]. This means that, once the players have achieved learning parameters within these intervals, the game value plateaus (see Fig. .1) and there is no incentive to learn further. Therefore, it is not always necessary to achieve complete competence so long as the players are able to "mimic" competence by executing an optimal strategy from the completely competent game Γ .

Incompetence in Biological Populations
Game theory as a mathematical paradigm found applications not only in economics and behavioural studies, but also in biology. Its first application to biology was driven by the puzzling fact that animal contests rarely result in fights or serious injuries, even though contestants are sufficiently equipped to engage in an open fight [68]. It was suggested that instead of considering individuals as players who may not be rational, the selection itself could be considered as a rational force of evolution, and survival of the entire population is more important than benefits to individual members. Since then, evolutionary game theory emerged as a branch of game theory and ecological sciences studying evolution under selection pressure [28,50,56].
Recently, the effects of environmental changes on the evolution of biological populations became one of the main foci of the field [3,26,75,81]. Since all organisms on this planet live in a dynamic environment that undergoes changes, the ability to adapt becomes key to survival. Adaptation is a process that improves survival skills and reproductive functions of species, and usually includes two components: genetic adaptation and learning. As a specific example, when a population migrates or their environmental conditions change, their responses to new environmental stimuli may differ, introducing behavioural mistakes in individuals' interactions. The concept of incompetence was proposed in [37] to address the learning aspect of the evolution of social behaviour. Under the assumption of incompetence of individuals, behaviours that were likely to be observed in the old environment, might not have the same frequency in the new environment, and as organisms adapt, they might re-learn their previous behaviours.

Evolutionary Games
Naturally, game assumptions in biological settings differ from the classic games since rationality of each individual behaviour might not always be natural to assume. Consider a population of species consisting of N individual organisms. At every time step, individuals interact in a pair-wise manner, where they have to choose one action out of n distinct available actions. Outcomes of these interactions determine fitness of individuals based on the fitness matrix R ∈ R n×n . In the evolutionary settings, all individuals in the population obtain the same fitness matrix R, however, during the interaction, Player 1's fitness is determined by R, while Player 2's fitness is determined by R T . Furthermore, the sets of selectable and executable actions coincide for all players. Let x = (x 1 , . . . , x n ), where x i denotes the frequency of the (pure) strategy i. We assume that in a given population, all individuals have the same set of selectable actions A, fitness matrix R, and the mixed strategy of the entire population x.
The main focus of evolutionary games is to predict the strategy x that will be adopted by the population. Since we assume that n actions are available to each individual, the resulting mixed strategies lie in the simplex Δ n defined by being a number of individuals adopting strategy i and N being a total number of individuals in the population. Then, an evolutionary game Γ e can be denoted by We say that the population adopts a pure i th strategy if all individuals are behaving as the i th type and, hence, their behavioural frequency vector is the unit basis vector e i . However, this may not always be the case. If not, we are in the case of mixed strategies x, and hence we are interested in finding a mixture x * which is a stable outcome of the evolution.
It was shown, that the concept of Nash equilibria is not sufficient when taking into account the evolution of populations [67]. As a result, a new equilibrium concept was proposed. The evolutionary stable strategy (ESS) ensures that population's strategy is resistant against random mutations and is defined, more precisely, below.
Definition 1 A mixed strategy x * is called an evolutionary stable strategy if one of the following conditions hold: Here, x * R(x * ) T measures the frequency-dependent fitness of the entire population, given that everyone adopts strategy x, whereas yR(x * ) T measure fitness of a population adopting strategy y in a population of individuals using strategy x. In the long run, an ESS guarantees that selection prefers x * to any other arising strategy. Note that the ESS is a special case of a Nash equilibrium [56].
However, besides equilibria, we are usually interested in how these equilibria can be reached, bringing us to the concept of evolutionary dynamics. Given that biological populations not only interact, but also reproduce, there is a need to take into account the reproduction process. The first classic evolutionary dynamics model was proposed by Taylor and Jonker in [74], and is called replicator dynamics. These dynamics assume well-mixed infinitely large populations which is, of course, a simplification. Subsequently, many new concepts of dynamics were suggested in order to capture mutations [12,57,70,72], finite size of populations and stochasticity [31,53,73,[76][77][78], adaptation [13,14,17,25,58], and a population structure [5,49,59,60]. However, to date, the concept of incompetence was only considered in a classic setting of replicator dynamics. In Conclusions section, we discuss possible extensions to other forms of dynamics for incompetent games.
Replicator dynamics captures a frequency-dependent selection, where the evolution of population's strategy depends on the current frequencies of all strategies in the population. That is, the fitness of a particular strategy is compared to the mean fitness of the entire population and is determined by the adopted strategies. With respect to a mixed strategy x ∈ Δ n , the expected fitness of a (pure) strategy i is defined by The mean fitness payoff of the population is then defined by the scalar Then, the dynamics of strategy i's frequency in the population is defined bẏ or in a matrix form,ẋ In the folk theorem of evolutionary game theory, it was shown that any equilibrium of the replicator dynamics is a Nash equilibrium of the game Γ e and that a strict Nash equilibrium is asymptotically stable [28]. Moreover, any ESS is an asymptotically stable equilibrium of the replicator dynamics. Hence, when considering evolutionary games, it is frequently sufficient to find equilibria of a static game Γ e . This simplification is useful when trying to predict how the behaviour of the game changes under the assumption that interacting individuals are incompetent. We shall next consider how incompetence changes the game setup.

Evolutionary Games under Incompetence
When introducing an assumption that individuals are prone to making behavioural mistakes in an evolutionary game, one can interpret such mistakes as a form of behavioural plasticity. In some ways, this can be seen as phenotypic plasticity (for instance, in microbes). However, in application to more sophisticated organisms, behavioural plasticity need not relate to genetic background of the organism. These behavioural mistakes can be driven by migration to a new environment or any other form of environmental change and are reflected in the incompetence matrix Q analogous to that introduced in Sect. 2.2.
Since we assume that the entire population obtains only one fitness matrix, we also assume that the incompetence matrix is given for the entire population. Then, a new incompetent fitness matrix is determined in a similar manner to (7) as In line with previous sections, we assume that players' ability for improving their strategy execution is determined by some parameter. Since here we consider one population of players all of whom obtain the same measure of incompetence, we only need one incompetence parameter λ ∈ [0, 1]. Then, the incompetent fitness matrix is defined as Throughout this section, we make a specific assumption on the functional form of learning. We assume that Q(λ) is linear and defined as where S is the staring level of incompetence and I is the identity matrix. When λ = 1, the population does not make any execution errors and has a perfect strategy execution. Now we can define the evolutionary incompetent game as We can further simplify the analysis by utilising the property of replicator dynamics that it is invariant under a linear positive transformation [27]. This allows us to reduce the fitness matrix by subtracting diagonal elements of R from the corresponding columns. Mathematically speaking, such transformation can be defined as where d R is a vector consisting of the diagonal elements of R and 1 n is a vector consisting of ones. Throughout the manuscript, we shall denote any matrixR as a canonical form of matrix R as in (40). Then, according to (33)- (35), for a new game under incompetence Γ e Q , we re-write the expected fitness for strategy i as and for the mean fitness payoff of the population, Hence, the incompetent replicator dynamics can be written aṡ In a strict sense, the new system given by (43) is a perturbed evolutionary game, and perturbations depend on the parameter λ. As λ tends to 1 for all i, the game under incompetence approaches the original game given by R. In the following section, we summarise the main results obtained for incompetent evolutionary games.

Equilibria Transitions
Here, we are mostly interested in behaviours that dynamics exhibit under changes in parameter values λ given the starting level of incompetence S and the fitness matrix R. These behaviours may arise for different values of λ and the dynamics change their behaviour at critical levels λ c of λ, referred to as bifurcation points, where equilibria emerge, disappear or change their stability properties.

Definition 2 [37]
A critical value λ c of the incompetence parameter is the bifurcation point of the replicator dynamics.
Under incompetence, behaviour of the game dynamics may exhibit several bifurcations [37]. Since by design the incompetence parameter approaches 1 when incompetence decreases, the incompetent fitness matrix R(λ) is approaching the original fitness matrix R. As a result, in the limit of perfect competence, behaviour of the incompetent game approaches the behaviour of the original game. That is, there exists a maximal critical value of λ, that preserves robust properties of the game. We recall this result in the following theorem. Theorem 6 [37] If the gameR possesses an ESS, x * , and ||Q(λ) − I || ≤ δ(λ u ), where λ u = max λ c is the maximal critical value of the incompetence parameter for a fixed point x * , then the incompetent gameR(λ), when λ ∈ (λ u , 1], possesses an ESS, x * (λ), and A natural question arises of how these bifurcation values of the incompetence parameter can be determined and the behaviour of dynamics. The larger the game (the more available strategies it has), the harder it becomes to define all possible bifurcations. However, even for an arbitrary number of strategies, we can find bifurcations of special equilibria, such as, interior equilibria or pure-strategies equilibria using analysis presented in [11]. Let us first focus on the bifurcations of the interior equilibria.
Definition 3 [37] Let x * be a fixed point and λ c be a bifurcation point that is also a zero of the mean fitness, namely, φ(x * , λ c ) = 0. Then, λ c is a balanced bifurcation parameter value.
Then, the point of bifurcation for an interior equilibrium can be found by considering a determinant of the incompetent fitness matrix.

Lemma 4 [37] If x * is an interior fixed point, that is, x *
i > 0, ∀i. Then every balanced bifurcation parameter value, λ c , is also a singular point ofR(λ) in the sense that det(R(λ)) = 0.
Next, we recall the special canonical form of the matrixR(λ) that is defined through a rank-one transformation of an incompetent fitness matrix R(λ). By [24], its determinant can be written as (45) Hence, critical values of the incompetence parameter can be found by finding zeroes of either det(Q(λ)), or . In a special case of a rock-paper-scissors game [40], stability of the interior equilibrium is determined by the sign of the determinant of the fitness matrix [80], which gives rise to three cases: (a) if det(R) < 0, then an unstable interior equilibrium exists resulting in a heteroclinic cycle; (b) if det(R) > 0, then such an equilibrium is a stable mixed equilibrium; (c) if det(R) = 0, then there exists a centre and periodic orbits around it.
However, gamesR(λ) and R(λ) exhibit the same behaviour [27]. Since the determinant of the fitness matrix R(λ) always preserves the same sign as det(R), then det(R(λ)) also cannot change its sign while the interior equilibrium exists.
Deriving a general form of equilibria depending on λ is complex and depends on the form of matrices R and S. For a special case of uniform incompetence, which implies that everybody makes mistakes with the same probability 1 /n, we can sometimes find a closedform expression for the interior equilibrium. A uniform incompetence can be interpreted as a form of plasticity in biological populations. For instance, phenotypic plasticity, when different types might have slight variations in the exact degree of each gene expression. We provide this result in the following theorem.
Theorem 7 [40] Let x * be an interior ESS for R. Then, for λ sufficiently close to 0, if the starting level of incompetence, S, is a uniform matrix, that is, s i j = 1 /n, ∀i, j = 1, . . . , n, then is an interior ESS for the gameR(λ).
In [11], it was shown that pure-strategy equilibrium's stability properties can be determined from the sign of the j-th column of matrixR(λ). Hence, given the maximal level of incompetence, we can determine which of the vertices will be stable and when this stability will change.

Theorem 8 [40] If
(s l − s j ) T Rs j < 0, ∀ l = j then vertex j is a stable point of the replicator dynamics with execution errors for λ ∈ [0, λ c ), where λ c is the smallest critical value of the incompetence parameter wherer l j (λ c l j ) changes its sign for some l = j.
This result can be generalised for any level of incompetence, where we will have to consider for all levels of λ, where q i is the i th row of Q(λ). Generally speaking, this condition implies that for a pure strategy to be stable under incompetence, it is necessary for it to be the best response to itself given all other pure strategies.
As in [38], in Fig. 5, we provide some cases illustrating how equilibria stability can change as competence level of the population changes, for an example of an unstable rock-paperscissors game. In every panel, the colour-coded bar at the top indicates which of the three vertices is stable, while in the main plot we depict the interior equilibrium components as functions of λ. Even in this simple game, the behaviour exhibited by the replicator dynamics in response to changing λ can be very rich. As shown in these three examples, the interior equilibrium may or may not exist for different values of the incompetence parameter. Similarly, one, two, three or none of the vertices may be stable at the same time.
As in the case of classical games, in evolutionary settings it is natural to consider decreasing levels of incompetence, a process we called learning. Note that increasing λ corresponds to greater skill level and decreasing incompetence.
In the evolutionary games setup, dynamic incompetence was interpreted from two different perspectives: as an environmental shift that requires adaptation from organisms before the stable equilibrium is reached and as a learning process designed to maximise the fitness of the population after the population stabilised at some equilibrium.
Behaviour exhibited by the population dynamics when the process of learning is treated as a function of time, λ(t), was considered in [36,38]. There are many options possible when choosing the functional form. So far, two functional forms of λ(t) were analysed: a sigmoid and a periodic function. The sigmoid form of learning implies that organisms are capable of learning faster in the beginning of the process and slower when they reach sufficient competence. An assumption of a slowing rate for high enough levels of λ is motivated by the absence of necessity to learn fast since the evolutionary stable outcome can already be reached (see Theorem 6).
Analysing parameters of λ(t) one can determine how long will it take for species to be fully recovered in behavioural sense and act as in the environment they are familiar with. However, while the functional form of the adaptation trajectory captures the pace and steepness of the learning process, the starting level of incompetence can be seen as a measure of the magnitude of the changes in the environment. That is, the further the new habitat is from the previous one, the longer it may take for organisms to fully recover.
Since natural habitats are prone to some form of regular stochasticity, in [36], it was also considered how periodic environmental fluctuations due to the seasonal or daily changes affect and report intervals of λ where one, two or all three vertices become stable in a colour-coded bar. Then, for λ = 1/2 we plot a phase diagram in Δ 3 of the corresponding replicator dynamics [32]. In the table for each example we report eigenvalues of the Jacobian of the replicator dynamics for each of the equilibria of the dynamics for a given λ = 1/2. Interested reader is referred to [38] for the code used to produce plots in A, B and C. the evolutionary dynamics. It appeared that periodicity of environmental changes leads to periodic behaviour in the evolutionary dynamics as well. Specifically, if the original game possesses a stable equilibrium, then the solution of the incompetent game with periodic form of incompetence will converge to a stable periodic orbit around this stable equilibrium.
Let us now demonstrate how the concept of incompetence can be applied to a more specific biological setting. In the next section, we formulate a game of two foraging strategies of marine bacteria and try to analyse it from the perspective of incompetence.

Bacterial Motility Game under Incompetence
Evolutionary game theory has been widely applied to studying the evolution of microbes. Despite their primitivism and small sizes, marine bacteria are among the most ubiquitous forms of marine organisms, playing a central role in governing health of marine ecosystems and regulating global biosphere [52]. Understanding how cells make decisions and interact has implications for both biology of bacterial communities and our exploitation of these communities [1,20,21,44,46,48]. Fundamental to nutrient competition among bacteria is the choice of motility (chemotactic) strategies. Chemotaxis-the ability to sense environmental signals, and react to the stimuli accordingly-has been studied since the late 1800s [16,61].
However, a deterministic game theoretic approach misses an essential feature of the bacterial population dynamics: these populations and their interactions are highly stochastic. For instance, stochastic environmental fluctuations often affect ecological systems [21]. In order to at least partially allow for this, the concept of incompetence was applied to study foraging strategies of bacteria in [36] by incorporating behavioural stochasticity in a matrix game that captures interactions between different strategic types of microbes. The aim is to identify the most efficient strategy for given environmental conditions. We consider two possible strategies: nonmotile or chemotactic. Nonmotile bacteria cannot induce active swimming and only drift with the water flow, whereas chemotaxis allows for active choice of direction. The fitness matrix can be constructed as Nonmotile Chemotactic N onmotile 1 0 where c is the cost of swimming and m is the reward for being able to efficiently determine the direction of swimming and both parameters are normalised so that c, m ∈ [0, 1]. Depending on the exact values of the parameters, the game might exhibit four different behaviours, as proposed in [82] in relation to the signs of matrix elements in a canonical form from (40) We shall focus on two cases: when chemotactic strategy dominates nonmotile strategy and when a stable mixed equilibrium exists. The mixed equilibrium is given by When introducing incompetence in a model, one has to take into account biological limitations of the strategies. For instance, there exist no conditions under which a nonmotile bacterium can exhibit chemotaxis because it lacks receptors and flagella required for such a strategy. However, a chemotactic bacterium can be both nonmotile and chemotactic. Hence, the starting incompetence matrix S for this example, may have the following form Then, the resulting induced incompetent fitness matrixR(λ) is given bỹ Note that the relative fitness of chemotactic strategy is not affected by incompetence. However, the advantage of nonmotile strategy depends on the level of incompetence induced by chemotactic bacteria. By using Lemma 4, we can determine the critical value of the  , t), for a given stochastic process of λ(t). In the second row we depict the executed or observed frequency of chemotactic strategy as a function of t and λ. The corresponding stochastic learning process λ(t) is depicted in the last row of Fig. incompetence parameter, which is determined as the solution of det(R(λ)) = 0 orr 21 (λ c ) = 0, and given by Depending on the stability properties of the dynamics in the original game, the behaviour of equilibria under incompetence will differ. For instance, if chemotactic strategy was dominating in a fully competent game, then for λ < λ c both strategies will stably co-exist. If a stable mixed equilibrium existed, then for λ < λ c chemotactic strategy will dominate nonmotile strategy.
Additionally, turbulence affects life of marine bacteria [71]. Mathematically, this can be modelled via a stochastic adaptation process. Construct a stochastic learning process where each point λ(t) is a random variable with distribution that is determined by the species' migration process. This assumption provides a more realistic interpretation of the species' behaviour when we take into account migration and environmental stochasticity. However, games with an ESS are well-known to be robust [11].
We shall compare population dynamics for two types of the learning processes: deterministic sigmoid learning and a stochastic migration process. Any learning process, either deterministic or stochastic, will lead to the ESS in terms of a species' choice (Fig. 6). Observations of the real behaviour depend on the incompetence matrix. Population's behavioural observations for different learning processes will be different depending on the learning dynamics.
The stochastic learning brings us to a situation where the majority of bacteria in a population are able to perform chemotaxis if chemotactic strategy was dominating (Fig. 6, left), as in the deterministic case. However, if there exists a stable mixed equilibrium, then dynamics under stochastic incompetence converges to the stable frequency of chemotactic bacteria as well (Fig. 6, right). This is yet different from the equilibrium of the game without incompetence.
Due to incompetence, extinct strategies may still reappear in the behaviour of individuals as a manifestation of mistakes that cause a revival of the extinct types. This randomisation may become beneficial as a changing environment may require flexibility from individuals in their adaptation.
Even if the adaptive peak has been reached (i.e. λ = 1), behavioural randomisation may become essential in preparedness to unforeseen changes. This is supported by the existing research in stochastic phenotype switching, when bacteria perform behavioural stochasticity even in stable environments [30].
When considering incompetent games, the main focus of the analysis is on where the dynamics will stabilise and whether a stable equilibrium will be reached. However, what if the change in the environment happened after the stable equilibrium was already reached? Is there an optimal way to re-learn effective strategies that is least costly in terms of fitness losses? We discuss results answering this question in the next section.

Prioritised Learning
When allowing for learning after the stable equilibrium is reached, the focus of the analysis is the population's need to re-learn its effective strategies in an optimal manner. Hence, in [39] the learning under incompetence was considered with respect to maximising the fitness over the learning path.
When addressing learning, one needs to distinguish whether the entire population is learning with the rate λ or whether each strategy has its own learning rate λ i ∈ [0, 1]. This decision depends on the specific situation under consideration. For this section, let us assume a more general case with λ = (λ 1 , . . . , λ n ) to define an evolutionary game under incompetence. Then, a performance measure of the learning path over fitness can be thought of as where C is a learning path that can be taken and φ C (λ) is the mean-fitness of the population. Note that here it is explicitly assumed that every strategy i has its own incompetence parameter λ i , which implies that λ = (λ 1 , . . . , λ n ). Since the complexity of the problem grows with the number of strategies, this model was considered in its simplest possible setup: when only two strategies compete. That is, the fitness matrix R has the canonical form and the starting level of incompetence S can be denoted as If the initial game has one stable pure equilibrium, then the optimal learning will simply imply reduction of frequency of execution of the unwanted strategy. However, if the game In order to maximise the fitness of the population, it is sufficient to consider the mean fitness function [74], which has the following form which under incompetence is reduced to the analysis of two parameters An important aspect stemming from this model is the understanding of fitness and learning advantages. Given that relative fitness of each strategy is positive, that is, a, b > 0, we say that the strategy with higher relative fitness obtains a fitness advantage.
In addition, we say that strategies may obtain a learning advantage. This concept is induced by incompetence and implies that the strategy with more variability in the behaviour, that is, with a higher probability of mistakes, has a higher potential fitness advantage which might be achieved by reducing incompetence. Hence, the lower η or γ , the greater the learning advantage. A new parameter δ :=ã −b was defined to measure the relative strategic advantage of one strategy over another. We summarise and compare these concepts in Table 1.
Hence, since we allow for one strategy to have a relative strategic advantage over another one, the optimal learning path depends on which strategy is advantageous. This phenomenon was called prioritised learning.

Definition 4
We say that there exists prioritised learning for Φ C (λ) among stepwise learning paths, if there exists C * such that one of the directions is preferable over the other. That is, is the fitness-over-learning depending on the direction of learning i and C 1 , C 2 are the learning paths in directions 1 and 2, respectively.
Interestingly, the sign of δ fully determines which strategy has to be learnt first. That is, it cannot be determined separately based on either fitness or learning advantages of strategies. The naive suggestion would be that the most advantageous skill in terms of fitness has to be learnt first. However, the strategy with lower relative strategic advantage is learnt first in the optimal learning path.
Theorem 9 [39] The direction of the optimal learning path is determined by the sign of δ: for δ > 0 the direction of Strategy 2 is optimal and for δ < 0 the direction of Strategy 1 is optimal. If δ = 0, then there is no difference in the direction of optimal learning, that is, We suggest that natural selection tries to compensate the most disrupted strategy first even if its fitness is not the highest. Nonetheless, if the fitness difference is high enough to overcome the effect of incompetence, then the optimal learning will demand that the better strategy is learned first. Another possible interpretation would be to consider the mixed equilibrium as mixed strategies used by players. Then, by learning the less-advantageous strategy first, individuals are reaching the nearest optimal mixed strategy.
In the next section we shall demonstrate results from three previous sections on a reduced 2-strategy game based on the foraging strategies of marine bacteria as presented in [36].

Bacterial Motility Game and Prioritised Learning
Let us now assume that the population has stabilised at the mixed equilibrium defined in (47). Assume that the environmental conditions have changed leading to deviations in strategy executions for both bacterial strategies. For this, assume that the new starting incompetence matrix is defined asŜ Since nonmotile bacteria can exhibit chemotactic behaviour only as a random noise, it is natural to assume that 1 ≤ 2 . Furthermore, let us allow for each strategy to be learnt at a different pace as in Sect. 3.5. In order to determine the optimal path that maximises the fitness over learning, we first calculate advantages of nonmotile and chemotactic strategies from (49) asã .
Then, the strategic advantage of nonmotile strategy over chemotactic strategy equals to .
Note that if 1 = 2 = , then δ = 1 / > 0 and chemotactic strategy has to be learnt first in one step (see Fig. 7 (left)). Generally, chemotactic strategy has to be learnt first whenever δ > 0 or equivalently which together with the condition 1 < 2 requires that m < 2c − 1 2 . We plot a special case when δ = 0 in Fig. 7 (right).

Conclusions and Future Extensions
This paper is predicated on the belief that competitions/games with incompetent agents/ players are ubiquitous in nature. Hence, formalising the notion of incompetence and modelling the impact of the resulting "mistakes" on the outcomes of games is worthy of detailed analysis. However, we first must recognise that everyday use of the word "incompetence" carries a very wide range of possible interpretations and hence needs to be narrowed down in order to be rigorously analysed.
Hence, the line of research we surveyed is limited to situations where incompetence can be adequately modelled via probability distributions on specified sets of actions available to one or more players (assumption [A1]). This implies that incompetence induced mistakes manifest themselves as random outcomes, different from intended outcomes. The latter certainly captures some essential characteristics of incompetence.
However, in the case of classical incompetent games studied so far, assumption [A1] was augmented by a requirement that players know one another's propensity to make mistakes. This "mutually known" aspect concerning the probability distributions of mistaken executions is certainly restrictive. For instance, it is clear that while it may approximately apply to a match between two professional tennis players at Wimbledon, it would not hold for two children playing one another. We hope that future investigations will relax this restriction.

Extensions for Classical Games
Currently, in the setting of classical nooncooperative game theory, incompetence has been studied mainly in matrix and bimatrix games. However, there are clearly several other, more general, classes of games to which this approach could be extended. Below, we name just four, out of many possible, generalisations. a) Continuum of actions. Although we have only dealt with players having finitely-many actions, the concept of incompetence could be extended to games with larger action spaces, for example, games with a continuum of actions. Given a game with a continuum of actions, mixed strategies are represented by cumulative distribution functions and expected utility is computed as a Riemann-Stieltjes integral. This means that, in this context, a general "incompetence-adjusted" utility function (as in (8)) would also need to be expressed in an integral form. b) Incompetence dependent action spaces. While the original incompetence framework described by Beck et al. [9] allows a player's selectable and executable actions to differ, the theoretical development to date addressed only the case where they coincide. Intuitively, it is clear that there are situations where a player's incompetence may contract or expand their set of selectable actions. However, this raises the conceptual challenge of dynamically capturing the changes to these sets, as a player reduces his or her incompetence via learning. This would need to be modelled in a sufficiently general and yet technically tractable way. c) Extensions to stochastic games. In stochastic games evolving over discrete time horizon, at each stage players play one of a finite set non-cooperative games called "states". The consequence of a single play is an immediate payoff (to each player) and a probabilistic transition to a new state (e.g. see the seminal paper [65]). Clearly, it is possible to replace each state by an incompetent non-cooperative game, thereby inducing an incompetent stochastic game. Such a generalisation would be interesting and, likely, tractable. d) Extensions to incremental learning. The incremental learning games formulated in Sect. 2.5 adopt several simplifying assumptions that could be relaxed to further extend the model. First, the assumption that a player's learning trajectory can be parameterised by a single learning parameter could be relaxed to allow for "multidirectional learning". Second, relaxing the assumption that a player's level of incompetence can never be decremented would allow the model to describe not only the process of learning, but also the process of forgetting what one has learnt.

Extensions for Evolutionary Games
It should be clear that the work done so far in studying incompetent evolutionary games constitutes merely a beginning. As above, in this section we briefly describe just three, out of many, possible continuations of this research. a) Generalisations of population dynamics. First of all, choosing to work within replicator dynamics setting carries with it simplifying assumptions which open it to criticism for oversimplification of natural reproduction processes. While replicator dynamics is a classical approach to modelling the effect of natural selection, over decades of research, these assumptions were relaxed in new approaches to modelling population dynamics.
In particular, the effect of the finite population size and inherent stochasticity of the reproduction process were addressed in finite population dynamics like Moran birth-death process, ability of others to imitate successful behavioural aspects of neighbours were addressed in imitation dynamics and the effect of interaction with neighbours was addressed in many different dynamics on networks. Hence, as a natural extension, one should consider how relaxed assumptions on the population dynamics affect the dynamics of games with incompetence. b) Generalising prioritised learning by exploiting the power of simulations. In recent years, evolutionary models adopted methods of computer simulations to facilitate exploration of more realistic complex models that would have been intractable using analytical methods.
Hence, one could extend our prioritised learning of Sect. 3.5 to allow more than two strategies to compete at the same time. Furthermore, it is tempting to allow every individual organism their own learning parameter so as to closer approximate natural scenarios.
While the complexity of such a complex model will render it intractable analytically, simulating specific setups may shed light on many puzzling biological problems. For instance, the problem of determining how niches emerge and are filled by organisms while interacting under many different environmental conditions with multiple organisms. c) Learning as a function of frequency of strategies. One simplifying assumption we made so far in all models of incompetence applied in biology is that of the separation of the learning process from the reproduction process. However, evolution of learning or levels of incompetence might be frequency dependent, which could lead to intricate co-evolution. Setups that follow similar logic were considered in [81] and [75]. While results presented in [75] can be seen as more general, the form of the exact dynamics of the learning process in the settings of incompetence was not addressed in previous works. Hence, we believe it would be worthwhile to consider co-dependence of x(λ) and λ(t, x).