Full vs. no information best choice game with finite horizon

Let us consider two companies A and B. Both of them are interested in buying a set of some goods. The company A is a big corporation and it knows the actual value of the good on the market and is able to observe the previous values of them. The company B has no information about the actual value of the good but it can compare the actual position of the good on the market with the previous position of the good offered. Both of the players want to choose the very best object overall. The recall is not allowed. The number of the objects is fixed and finite. One can think about these two types of buyers a business customer vs. an individual customer. The mathematical model of the competition between them is presented and the solution is defined and constructed.


Introduction.
The very well known secretary problem has also many modifications. Ferguson 1 has made a review of the concepts of the best choice problem going back to the age of Kepler and Cayley. Presman and Sonin 2 considered so called no-information problem in which the appearing objects comes from the rank distribution, i.e. that the objects are observable, the decision maker can rank them and all permutations of the appearing objects are equally possible. Another approach was presented by Gilbert and Mosteller 3 where the exact value of the object is observable, and the distri-bution of the object is known (it is assumed to be an uniform distribution on the interval [0, 1]). Both of the ideas can be described as an optimal stopping of Markov chain. In both of them there is only one decision maker and there is no competition concept. The game concept of the secretary problem was introduced by Dynkin. 4 Many examples were solved by Yasuda. 5,6 1.1. Business motivation.
Consider a two companies A and B. Both of them are interested in buying a set of some goods, ex. an asset on a stock exchange. Th company A is a big corporation and it knows the actual value of the good on the market. What is more it knows the previous values of the objects and can compare them together. The problem of the company B is that it has no information about the actual value of the good. However the owner of the company B can compare the actual position of the good in the market with the previous observations. Both of the players want to choose the very best object overall without possibility of recall. The number of the objects is fixed and finite. A very good example can be described from the position of reliability. Consider a two buyers of the same item. Both of them want to buy the most reliable object. The buyer A has possibility to get know the reliability function values derived by the experts and quality controllers. The player B has no such a contact and intelligence, so he must rely on his basic knowledge and the knowledge of the previous observation, i.e. he can judge is the object better or worse than the previous one. We can say that the buyers of the objects are two types: first is a business customer and second is an individual customer.

Related game models.
The considered game recalls various conflict driven by the stochastic sequences. Usually the bilateral setting of the decision problem is preceded by unilateral consideration. The form of the optimal strategy in the decision problem is the inspiration for mean-value formulation. The threshold strategies are crucial tool for the optimal stopping problems. The simplest case related to observation of a sequences of random variables can be found e.g. in papers [7] or [8]. The bilateral extension of these models can be found in the papers [9]. Two players, I and II, observe sequentially a known finite number (or a number having a geometric distribution) of independent and identically distributed random variables. They must choose the largest. The variables cannot be perfectly observed. When a random variable is sampled, the sampler is informed only whether it is greater or less than some level he has specified. Each player can choose at most one observation. After the sampling, the players decide for acceptance or rejection of the observation. If both accept the same observation, Player 1 has the priority. The class of adequate strategies and a gain function are constructed. In the finite case, the game has a solution in pure strategies. In the case of a geometric distribution, Player 1 has a pure equilibrium strategy and Player 2 has either a pure equilibrium strategy or a mixture of two pure strategies. The game is symmetrical as the players are watching the same string to the same extent. Increasing the opposing interests is possible by the completely different preferences of the players. Evaluation of the same object by two decision makers can mean that players observe the different coordinates of the vector and formulate their expectations for their realization. When players aim is to achieve a minimum level of the observed rate then the problem can be reduced to a game in which strategies are setting of just levels. Discussion of such issues can be found in the works of Sakaguchi (e.g. [10]). However and in those tasks, though, that the information players are incomplete, lacking clear asymmetry players. The pay-offs of the players are function of the thresholds and the perfect comparison of the observed variable with these defined levels is guarantee. Asymmetric tools in measure of the observed r.v. are presented by Sakaguchi and the Second Author. 11 However, for private random variables. These players with asymmetric tools applied to there same sequence are subject of consideration in the paper.

Mathematical formulation of the problem.
In fine-tuning the mathematical model, we will use the methods of optimal stopping of stochastic processes 12 for Markov sequences 13,14 and game models with optimal stopping 4 of such sequences, similar to what is done in the works [15,16].
Let (Ω, F , P) be enough reach probability space to define the random sequence {X n } N n=0 , X · : Ω → R ⊂ ℜ, N ∈ N∪{∞}. There are two observers (and at the same time decision makers) of the basic sequence defined by the mappings {ϕ i n }, i = 1, 2, where ϕ i n : ℜ n → ℜ, having his aims defined by the pay-off functions f i : R → ℜ. Other words, the player i at moment n observes ξ i n = ϕ i n (X 1 , . . . , X n ). The strategies of the players are stopping times τ ∈ S i with respect to the appropriate filtrations F i n = σ{ξ i 1 , . . . , ξ i n }. Each player, based on the observations available to him, is tasked with choosing the moment of accepting the state of the process based on the previous observations so as to maximize the expected payment.
v i = sup It can be used to reduce the initial problem to the task of optimal retention of conditional expected values relative to its filtration. 14 Let us calculate for every n ∈ Nf Let us assume that the observation processes {ξ i n } N n=0 , i = 1, 2, belongs to Markov processes. In this case the solution of the problem (1) can be get by the procedure described by Shiryaev [14,Ch.3] which is based on the Bellman-Jacobi equation. Denote S n = {τ ∈ S : τ ≥ n}. When we have two decision makers hunting for a convenient state of the process have the right to declare the stopping at most twice. The second only, when the first hired state is assigned to the opponent. The natural set of strategies are The pay-off in the competitive case is defined in various way. Following the discussion in [16] for given ρ i ∈ U i , i = 1, 2, and 0 ≤ p ≤ 1 is the priority parameter, i.e. the probability that the state will be assigned to the player 1. The pair of strategies ( In practice it is difficult to construct the solution and calculate the value of the problem in such general form . However, for some natural cases each player can estimate his final reward by calculating his potential reward (award) based on his knowledge (filtration). The idea of these simplifications is presented in the next sections. (1) The player I has no information, i.e. he observes only the relative ranks of the current objects. (2) The player II has full information, i.e. he observes sequentially X 1 , ..., X N i.i.d., sees its value, and also can calculate the rank of the current object.
To be more specific let us denote by Y n the relative rank of the n-th observation The filtration of the player II is F (1) n = σ(X 1 , ..., X n ) and for the player I is F n for every n. Denote by T 1 a set of all stopping times with respect to the family {F n } N n=1 . Let T 0 1 denote a set of all stopping times τ ∈ T 1 such that X n = max{X 1 , ..., X n } on {τ = n}, n = 1, . . . , N . Define the moments where the biggest observation appear, i.e. τ 1 = 1, τ k = inf{n : τ k−1 ≤ n ≤ N, X n = max{X 1 , ..., X n }} for k = 1, . . . , N . We observe that the sequence τ 1 , τ 2 , ... ∈ T 0 1 . Now let us consider the following chain where ∂ is special absorbing state. It is easy to see that {Z k } N +1 k=1 is a Markov chain with transition probabilities (cf. [17]) p((n, x), (m, B)) = x m−n−1 B dy, m > n (5) and 0 otherwise, with B ⊆ (x, 1]. The reward for the player II for stopping on nth object of the value X n = x is and for continuing observation is given by Gilbert & Mosteller 3,17 In similar way for the player I consider a sequence of indicators {I n } N n=1 , where I k = I {Y k =1} . Let us denote by G n = σ(I 1 , ...I n ) sequence of sigma fields generated by indicators and let T 2 be the set of all stopping moments τ with respect to σ -fields G n , n = 1, ..., N . Define a process ξ t in the following way with initial point ξ 0 = 1. Calculate transition probabilities 13 The first player's reward for stopping on nth candidate (i.e. Y n = 1) is s 1,n = n N and for continuing observations

Equilibrium states.
Suppose that we are in some moment n and the value of the current candidate is x and both players want to stop. If the player II gets the object (with probability 1 − p) he gets reward s 1,n (x). With probability p the player I gets the object so II must continue the observations and gets reward c 1,n (x). The situation that in the future the opponent will find the best object is also included into the reward. Similar consideration gives us the payoff for the player I. Let us denote: and Then the payoff matrix is given by where T stands for the Bellman operator. 14 Since both players want to maximize their profits we have the following conditions) to (S,S) be a Nash equilibrium: which leads to the equations For player I we get the optimal moments for stop. They are those number n ≥ n * where n * is given by the standard optimal threshold (cf. [3]): and the optimal stopping set for player II is also the same set form the standard optimal stopping problem where x n is the solution of equation The optimal stopping times are: for the player I: τ 1 = inf{n > n * , Y n = 1} and for the player II: τ 2 = inf{n : (n, X n ) : X n = max{X 1 , ..., X n } = x ≥ x n }.. Lemma 1. In the game described above strategy (S, S) is pure Nash equilibrium if X n is a local maximum and X n ≥ x n , n > n * .
Let us consider case when p < 0.5. Suppose that n > n * and the value of the current observation is X m = x ≤ x n and its relative rank is 1. If we are bellow this threshold then it is more optimal for player II to change his strategy on F. The best response for player I on the strategy of the opponent is continue stopping if the expected future reward is not greater than the actual reward: To calculate the expected future reward we have to calculate the Bellman operator for payoff T v 1,n . The no-information player knows that the opponent has more information. Since the opponent chooses the strategy F we know that the present value of the object is less than the threshold x n . Suppose for a moment that we know this value and it is x. If we knew this value the future payoff would be where a ∨ b = max{a, b}. However we have to average it. Knowing that the actual value is uniformly distributed on the interval [0, x n ] (since the opponent wishes to continue the observations) we get Let us consider the set M 1 = {n * < n ≤ N : T v 1,n ≤ w 1,n }. Note that this set in not empty. It contains the number {N }. Using method of backward induction we can find the lower bound for this set, i.e. the index n = min{n * ≤ n ≤ N : T v 1,n ≤ w 1,n }.
Lemma 2. Suppose that the current state of the process (n, X n ) is such that n ≥ñ, X n = x ≤ x n . and X n is a local maximum. Then the strategy (S, F ) is the pure Nash equilibrium in the game described above.
Now suppose that n =ñ − 1 and the current state of the process is (n, X n = x) where x ≤ x n . Since the player I changes his strategy into F it is necessary to check whether the condition T v 2,n (x) ≥ w 2,n (x) is satisfied. Indeed it is true. Bellman's operator is the expected value of the future reward. Since now the reward w 2,n (x) < 0 and the future reward is positive for p < 0.5 it is more optimal to take an action F for the player II. Now the same consideration are made forñ − 2,ñ − 3, ... etc. So from this considerations we have the following Lemma 3. Suppose that the current state of the process (n, X n ) is such that n ≤ñ, X n = x ≤ x n and X n is a local maximum. Then the strategy (F, F ) is the pure Nash equilibrium in this state in the game described above. Now consider the case when n = n * − 1 and X n = x > x n is the local maximum. This is the opposite situation when the player II prefers to stop but the player I prefers to continue the observations. To find is strategy (F, S) the equilibrium point we have to check is the condition This is equivalent to the condition The expression under the double sum is positive in the interval [0, 1]. It means that on the left hand side we have a positive number which is always bigger than the expression on the right hand side which is negative. There fore for n = n * − 1 and x > x n it is better for player II to not change his strategy. Continuing this calculations we get that it is also better to not change his strategy when n < n * and x > x n .
Lemma 4. Suppose that the current state of the process (n, X n ) is such that n < n * , X n = x > x n and X n is a local maximum. Then the strategy (F, S) is the pure Nash equilibrium in the game described above.
Lemma 5. Suppose that the current state of the process (n, X n ) is such that n < n * , X n = x ≤ x n and X n is a local maximum. Then the strategy (F, F ) is the pure Nash equilibrium in the game described above.

Conclusion
The model presented in this work was created as a fruit of reflection on real problems in the field of business and finance. In the competition between two opponents from which one of them has access to more data we have found the equilibrium states. If the priority parameter of no-information player p ≤ 0.5 we have found that no-information player has to change his strategy in relation to the situation if he remained in the game alone. However the full-information player does not intend to change his strategy.
The numerical examples presented here are good presentation of the model.