Search for a moving target in a competitive environment

We consider a discrete-time dynamic search game in which a number of players compete to find an invisible object that is moving according to a time-varying Markov chain. We examine the subgame perfect equilibria of these games. The main result of the paper is that the set of subgame perfect equilibria is exactly the set of greedy strategy profiles, i.e. those strategy profiles in which the players always choose an action that maximizes their probability of immediately finding the object. We discuss various variations and extensions of the model.


Introduction
In this paper we consider a dynamic search game, in which an object moves according to a timevarying Markov chain across a finite set of states. The set of competing players can be either finite or infinite. At each period, an active player is chosen according to a fixed distribution. The active player can search for the object in one of the possible positions. If the object is found by the active player, this player wins and the game ends. Otherwise, the object moves according to the transition matrix, and the game enters the next period. Each player observes the action chosen by his opponents, and the transition probabilities, initial probabilities and probabilities of being the active player are known to all players. The goal of each player is to find the object and win the game. So each player prefers the winning outcome and is indifferent between the outcomes in which one of his opponents wins or in which nobody finds the object.
The main result of this paper is that the set of subgame perfect equilibria is exactly the set of greedy strategy profiles, also known as myopic strategy profiles, i.e. those strategy profiles in which the active player always selects one of the most likely positions containing the object. The key to this result is to show that by playing a greedy strategy, each player can guarantee that he wins with a probability at least as much as the probability of being active at each period.
The rest of this paper is divided as follows. In Section 1.1 we refer to related literature. In Section 1.2 we introduce the model and in Section 1.3 we present an illustrative example In Section 2 we study the winning probabilities of players playing a greedy strategy, and we present a characterization of the subgame perfect equilibria. We show that the set of subgame perfect equilibria is exactly the set of greedy strategy profiles. In Section 3 we discuss some extensions of the model and see to which extend the main result still holds or not. The conclusion is in Section 4.

Related literature
The field of search problems is one of the original disciplines of Operations Research, with various applications such as military problems, R&D problems or patent races, and many of these models involve multiple decision makers. In the basic settings, the searcher's goal is to find a hidden object, also called the target, with maximal probability or as soon as possible. By now, the field of search problems has evolved into a wide range of models. The models in the literature differ from each other by the characteristics of the searchers and of the objects. Concerning objects, there might be one or several objects, mobile or not, and they might have no aim or their aim is to not be found. Concerning the searchers, there might be one or more. When there is only one searcher, the searcher faces an optimization problem. When there are more than one searchers, they might be cooperative or not. If the searchers cooperate, their aim is similar to the settings with one player: they might want to minimize the expected time of search, the worst time, or some search cost function. If the searchers do not cooperate, the problem becomes a search game with at least two strategic non-cooperative players, and hence game theoretic solution concepts and arguments will play an important role. For an introduction to search games, we refer to [2], [11], [12], [13], [14], and for surveys see [5] and [20].
There are many different types of search games in the literature due to variations in characteristics of the searchers and of the object. [27] introduced a simple search game with one searcher, in which an object is moving across two locations referred to as "state 1" and "state 2" respectively, according to a discrete-time Markov chain. The objective of the searcher can be either to minimize the expected number of looks to find the object, or to maximize the probability of finding the object within a given horizon. The model allows for overlooking probabilities, which means that even if the searcher chooses the correct location, he may fail to find the object there. Instead, [23] investigates the search problem with three states, while assuming perfect detection, so not accounting for overlooking probabilities. [10] investigates a search game similar to [27], but in addition to searching for the object in state 1 or state 2, the player has a third option which is to wait. This option is costless whereas searching is costly. Waiting could induce a favourable probability distribution over the two states next period. They find a unique optimal strategy characterized by two thresholds. [21] in his PhD thesis studies the structural properties of the optimal strategy, where the goal is to find the object while minimizing the search costs. Thereby he derived some properties of the optimal strategy for the search problem with a finite set of states in the no-overlook case and for the case where each state has the same overlooking probability and cost. [4] investigates a search problem with two states in which the object moves according to a continuous-time Markov process. His objective is to find the object with a minimal expected cost, where the "real time" until the object is found is also taken into account in the cost structure. A competitive environment with more searchers and a static object is considered in [24]. A classical reference for an overview of search games is the book of [1]. For a recent paper on search games, we refer to [15]. They model the search game as a zero-sum two-person stochastic game where the first player is looking for the other one. They provide upper and lower bounds on the value of the game.

The Model
The Game. An object is moving over a finite set S = {1, ..., n} according to a time-varying Markov chain. The initial distribution of the object is given by π = (π s ) s∈S , and the transition probabilities at each period t ∈ N are given by the S × S transition matrix P t , where entry P t (s, s ′ ) is the probability that the object moves to state s ′ , given it is in state s at period t.
Let I denote a set of players, who compete to find the object. We assume that I ⊆ N; so the set I can be either finite or countably infinite. The players do not observe the current state of the object, but they know the initial distribution π and the transition matrices P t , for each t ∈ N. At each period t ∈ N one of the players is active: player i is active with probability q i > 0, where i∈I q i = 1. The active player chooses a state s t ∈ S. If the object is in state s t , then the active player finds the object and wins the game. Otherwise the game enters period t + 1. We assume that each player observes the actions chosen by his opponents, and knows the probabilities q i , for each i ∈ I.
The goal of each player is to find the object and win the game. For each player the game has three possible outcomes. The first possible outcome is that the player himself finds the object and wins the game. The second outcome is that one of his opponents finds the object and wins the game. The third outcome is that no one finds the object. In this game each player prefers the first outcome, but is indifferent between the second and third outcomes. This means that players do not have opposite interests, which makes it a non-zero-sum game.
Actions & Histories. The action set for each player i is A i = S. Thus, a history at period t ∈ N is a sequence h t = (s 1 , . . . , s t−1 ) ∈ S t−1 of past actions. By H t = S t−1 we denote the set of all histories at period t. Note that H 1 consists of the empty sequence. Given a history h t , with the knowledge of the initial probability distribution π and the transition matrices P 1 , . . . , P t−1 , the players can calculate the probability distribution of the location of the object at period t.

Strategies. A strategy for player i is a sequence of functions
The interpretation is that, at each period t ∈ N, if player i becomes the active player, then given the history h t , the strategy σ i,t recommends to search state s ∈ S with probability σ i,t (h t )(s). We denote by Σ i the set of strategies of player i. We say that a strategy is pure if, for any history, it places probability 1 on one action. A strategy is called greedy if, for any history, it places probability 1 on the most likely states. In the literature the greedy strategy is sometimes also called the myopic strategy.
Winning probabilities. Consider a strategy profile σ = (σ i ) i∈I . The probability under σ that player i wins is denoted by u i (σ). Note that The aim of player i is to maximize u i (σ). If the object has not been found before period t, and the history is h t , the continuation winning probability from period t onward is denoted by u i (σ)(h t ) for player i.
Subgame perfect equilibrium. A strategy σ i for player i is a best response to a profile of A strategy σ i for player i is a best response in the subgame at history h to a profile of strategies is called a subgame perfect equilibrium if, at each history h, in the subgame at history h the strategy σ i is a best response to σ −i for each player i ∈ I. In other words, σ is a subgame perfect equilibrium if it induces an equilibrium in each subgame.

An illustrative example
In the model we introduced the greedy strategies. However, it is not clear a priori if those strategies are relevant, nor if one greedy strategy can be better or worst than another one. Now we examine an example in order to try to answer those questions. Consider the following game with a parameter c ∈ (0, 1), with two states and two players. At each period player 1 is active with probability q 1 = 0.99 = 1 − q 2 . The initial probability of the location of the object is π = (c, 1 − c) and the transition matrices are defined as follows: for all t ∈ N, The induced Markov chain of this game is depicted in Figure 1.
We discuss two cases. Intuitively, if player 2 gets the chance to be active at period 1, he should look in state 2 as his chance to play later is quite low. However, it is not immediately clear what player 1 should do at period 1. On the one hand, if he looks in state 1 and he does not find the object, then he finds it in period 2 with probability 1 if he can play, which is likely to happen as q 1 = 0.99. On the other hand he might also simply want to maximise his chance to find the object at period 1 by looking at box 2.
More precisely, assume first that player 2 is active at period 1. If he looks at state 1, with probability c he finds the object and with probability 1 − c the active player at period 2 can find the object by looking at state 1. In this case, player 2 finds the object with probability c + q 2 · (1 − c). However, if he looks at state 2, with probability 1 − c he finds the object and with probability c the active player at period 2 can find the object with probability 1 − c by looking at state 2. In this case, player 2 finds the object with probability at least 1 − c + q 2 · c · (1 − c) > c + q 2 · (1 − c) as q 2 = 0.01 and c < 1 3 . Thus, it is strictly better for player 2 to look at state 2 at period 1. The same holds for each subgame in which the probability distribution of the object is (c, 1 − c). Now assume that player 1 is active at period 1. If he looks at state 1, then similarly to the analysis of player 2, he finds the object with probability c + q 1 · (1 − c). However, if player 1 decides to choose state 2 at each period he is active, by our analysis of player 2, it follows that player 2 will also do the same. Hence player 1 finds the object with probability So it is better for player 1 to choose state 2 at period 1.
From our discussion, it follows that every Nash equilibrium induces the play in which at each period the active player chooses state 2. If a deviation occurs to state 1 and the object is not found, the next active player chooses state 1 and wins the game. This is a greedy strategy profile as at each period the most likely state is chosen.
Case 2. Consider the case in which c = 1/2.
In this case, both states are equally likely, so a greedy strategy can choose any state at period 1.
Note that at any period, if state 1 is chosen and the object is not found, a greedy strategy finds the object at the next period in state 1. However, if state 2 is chosen and the object is not found, a greedy strategy can choose any of the two states at the next period. Therefore, there are many greedy strategy profiles, and all can induce different plays. It is then natural to ask whether every greedy strategy profiles induce the same winning probabilities. As we will show later, every greedy strategy profile leads to the winning probability q 1 for player 1 and q 2 for player 2.

Greedy strategies and Subgame perfect equilibria
In this section we prove our main result, which is that the set of subgame perfect equilibria is exactly the set of greedy strategy profiles. We start by introducing an intermediate result related to the winning probability guarantees of the players who play a greedy strategy. Proposition 1. Consider a strategy profile σ = (σ j ) j∈I and a player i ∈ I. If σ i is a greedy strategy, then under σ, the object is found with probability 1 and player i wins with probability at least q i .
Consequently, if σ is a greedy strategy profile, then under σ, each player i wins with probability q i .
Proof. Let σ = (σ j ) j∈I be a strategy profile in which player i plays a greedy strategy.
Step 1. Under σ, the object is found with probability 1.
Proof of Step 1. Consider an arbitrary period t ∈ N and suppose that the object has not been found yet. Then, player i plays with probability q i , and when he plays he wins with probability at least 1/n. This implies that the object is found at period t with probability at least q i /n. Since this holds for each period t, under σ the object is found with probability 1.
Step 2. Under σ, player i wins with probability at least q i .
Proof of Step 2. Consider a period t ∈ N and a history h ∈ H t . For each state s ∈ S, let z h (s) denote the probability that the object is in state s at period t given the history h. Let z * h = max s∈S z h (s).
For each player j ∈ I, let w j (h) denote the probability that after history h player j becomes the active player and by using the mixed action σ j (h) he wins immediately. Since σ i is a greedy strategy, Hence, given that the object is found at period t after the history h, the conditional probability that player i finds it is Since this holds for every period t and every history h ∈ H t , and since by Step 1 the object is found under σ with probability 1, player i wins with probability at least q i .
Theorem 2. The set of subgame perfect equilibria is exactly the set of greedy strategy profiles.
Proof. The proof is divided in two steps.
Step 1. Every greedy strategy profile is a subgame perfect equilibrium.
Proof of Step 1. Let σ be a greedy strategy profile and consider a subgame at a history h ∈ H. We show that σ is an equilibrium in this subgame. Consider a player i and a deviation σ ′ i . By Proposition 1, the strategy σ j guarantees to each player j ∈ I that he wins with probability at least q j , even if another player deviates. Since j∈I q j = 1, we find u i (σ)(h) = q i and u i (σ ′ i , σ −i ) ≤ q i . Thus, the deviation σ ′ i is not profitable.
Step 2. Every subgame perfect equilibrium is a greedy strategy profile.
Proof of Step 2. Assume by way of contradiction that there exists a subgame perfect equilibrium σ = (σ i ) i∈I , which is not a greedy strategy profile. Suppose that σ i is not greedy at history h. Let strategy σ ′ i be a one-deviation from σ i at history h, under which player i plays greedy at history h.
) denote the probability that player i finds the object immediately with σ i at history h given that he is the active player. Similarly, let z h (σ −i (h)) denote the probability that one of the opponents of player i finds the object immediately with σ −i at history h given that one of the opponents is active. Then we have The first equality follows from the following argument. Player i has a probability of q i of becoming the active player. Then, he wins immediately with probability z h (σ i (h)) or he does not find the object immediately with probability 1 − z h (σ i (h)) and still wins in the future with probability q i by Proposition 1. However, with probability 1 − q i one of his opponents is active. Then, player i still has a chance to win. His opponents fail to find the object immediately with probability 1 − z h (σ −i (h)) and then player i can again win in the future with probability q i .
Similarly, for the deviation σ ′ i we have Since z h (σ ′ i (h)) > z h (σ i (h)) holds due to the fact that σ ′ i is greedy at h but σ i is not, we find . This however contradicts the assumption that σ is a subgame perfect equilibrium.

Extensions & Variations
In this section we consider several extensions of the model and discuss if the main theorem still holds or not.

Extensions where our results still hold
• History dependent transitions. In the model description we assumed that the object moves according to a time-varying Markov chain. A more general situation is when the transition probabilities at each period t can also depend on (i) the sequence of past choices of the players, (ii) the sequence of past active players, and (iii) the sequence of states visited by the object in the past. Without any modification in the proofs, our main result, Theorem 2, remains valid.
• Overlooking. Our main result, Theorem 2, can be extended to a model with overlooking probabilities. Overlooking means that the object is in the chosen state, but the active player "overlooks" it and therefore does not find it. Note that players cannot distinguish between overlooking and searching in a vacant state. For this extension, let δ s < 1 denote the probability of overlooking the object in state s, for each s ∈ S.
The overlooking probabilities need to be taken into account when defining greedy strategies. A greedy strategy should choose a state s that maximizes the probability of containing the object times the probability of not overlooking the object, i.e. it should maximize the immediate probability of finding the object.
• No active player. The model could also be adjusted for the probability that no player is active, i.e. i∈I q i < 1. Then, r := 1 − i∈I q i is the probability that no player is active at a certain period. We still assume that q i > 0 for each player i, so that r < 1. Our main result, Theorem 2, would still hold, as the key properties of Proposition 1 remain valid.

Variations where results break down or need adjustment
• Robustness to finite horizon and discounting. Let σ be a greedy strategy profile. By Theorem 2, σ is a subgame perfect equilibrium, and in particular, an equilibrium. By standard arguments, for every error-term ε > 0, the strategy profile σ is an ε-equilibrium if the horizon of the game is finite but sufficiently long. Also, σ is a subgame perfect ε-equilibrium if, instead of maximizing the probability to find the object, now each player i maximizes ∞ t=1 δ t−1 · z i,t , where δ ∈ (0, 1) is a discount factor and z i,t is the probability that player i finds the object at period t.
The main reason why this robustness property holds is that, as long as at least one player plays a greedy strategy, the object will be found at an exponential rate, so essentially in finite time.
Note however that a greedy strategy is not necessarily a 0-equilibrium on finite horizon. Indeed, consider the game represented in Figure 2 over T = 2 periods. There are three states. The initial probability distribution of the object is given by π = (1/3 + ε, 1/3 + 2ε, 1/3 − 3ε), where ε > 0 is small enough so that 0.99 · (2/3 + 3ε) ≤ 2/3, and the transitions at period 1 are given by the arrows in Figure 2. There are two players, and player 1 plays at each period with probability q 1 = 0.99. The greedy strategy profile is to choose state 2 at period 1 and state 1 at period 2. Under this strategy profile, player 1 finds the object with probability q 1 · (2/3 + 3ε) ≤ 2/3. However, it would be a profitable deviation for player 1 to choose state 3 at period 1 and state 1 at period 2. Indeed, player 1 is the active player at both periods with probability (q 1 ) 2 , and in that case this strategy finds the object with probability 1.
• Period-dependent probabilities q i . It is crucial for our proofs that the probabilities with which the players become active do not depend on the period. The reason is that in this setting, players have to balance between finding the object at the present period and putting the opponent into a difficult position in the next period.
• Infinitely many states. Assume S = N. It is easy to see that it can be impossible to find the object with probability 1 if the transition law of the object is diffuse. Suppose that the object starts in states 1, 2 and 3 with equal probability, and then moves to 3 t+1 new states with equal probability at each period t. In that case, the highest probability to find the object is at most 1/3 + 1/9 + 1/27 + · · · = 1/2.
As a consequence, a greedy strategy profile might not be a Nash equilibrium, and vice-versa. Indeed, it might be very important to choose a certain state at period 1, even if it has a low probability of containing the object, so that such a diffusion of the object is prevented.

Conclusion
In this paper we examined a discrete time and space search game with multiple competitive searchers who look for one object moving over finitely many locations. We showed that if each player has the same probability to play at each period, the set of subgame perfect equilibria is exactly the set of greedy strategy profiles (cf. Theorem 2). We discussed several variations, such as the finite truncation of the game, the discounted version of the game, cases with infinitely many states, overlooking probabilities and we examined the possibility that no player is active at a period. A challenging task would be to investigate stochastic search games when the probability of a player to be active depends on the history.