# Full-information best choice game with hint

## Abstract

In the classical full-information best choice problem a decision maker aims to select the best opportunity. His decision is based only on the exact values of the observed sequence. In this paper we consider two modifications of the above problem. We add a second player who can either propose additional information or block the observed object and demand an extortion. Our goal is to establish an optimal reward for the second player and the best moment to interrupt the decision process. The situation when the number of observations tends to reach infinity has been studied.

## Keywords

Optimal stopping Best choice problem Matrix game Markov chain Threshold strategy## Mathematics Subject Classification

90C40 60G40## 1 Introduction and literature review

The best choice problems are the most inspiring problems in modern mathematics. Its origin is the so-called secretary problem. A comprehensive work on this issue can be found in Ferguson (1989). Gilbert and Mosteller (1966) considered different variants of the best choice problems for the very first time and solved by the heuristic arguments. They categorize the rank-based problem as a no-information case (which includes the classical secretary problem). On the other hand, we have the so-called full-information case, where we may base our choice of the stopping time on the true values of the object. This is a much more complex problem. In other words, we can say that the no-information problem is a simplified, full-information problem. It is always possible to calculate the rank of the currently observed object by counting how many predecessors were better. Unfortunately, the opposite operation is not possible. In this work we focus on the full-information case. A Markovian approach, which is widely used in this article was presented by Bojdecki (1978). Exact solutions for initial problems were presented by Samuels (1982). Many modifications of the problem have been made. Porosiński (1987) presented a model in which the random horizon was introduced. The link between the infinite problem and planar Poisson process was presented in Gnedin (1996), and following that, the exact results for the initial problem were derived by Gnedin and Miretskiy (2007). The full-information best choice problem where two choices are allowed was presented in Porosinski (1992). Petruccelli (1982) allowed the solicitation in the choice. A modification in which the decision maker can go back only in some fixed number was considered by Tamaki (1986). Recently, Kuchta (2017) derived an optimal strategy for the iterated full-information best choice problem.

The game version of the best choice problem was treated by many authors, e.g. (Porosiński and Szajowski 1996; Sakaguchi and Szajowski 1997; Sakaguchi 1984). The game with a hint was presented in Dotsenko and Marynych (2014). The authors considered the problem for the no-information case. In their version, the decision maker can observe only the ranks of objects.

## 2 Preliminaries: full-information best choice problem

*P*. Fix \(n\in \mathbb {N}\) and consider an i.i.d. sequence \(X_1,\ldots ,X_n\) from continuous distribution

*F*(

*x*). Without loss of generality we can assume that it is a uniform distribution \(\mathcal {U}(0,1)\), i.e. \(F(x)=x, x\in [0,1]\). Define a filtration

*A*

*j*. In the case when \(\{ \tau _j = \infty \}\) we introduce a special absorbing state \(\delta \). The sequence in (4) is a homogeneous Markov chain on the state space \(\mathfrak {E}=(1,\ldots ,n)\times (0,1)\cup \delta \) with sigma algebra \(\mathcal {E}\). One step transition probabilities (i.e. \(P(\xi _{j+1}=\mathfrak {b}|\xi _j=\mathfrak {a})\)) for the above chain are defined as

*k*th object is relatively the best and its value is \(X_k=x\) in case of selecting it we obtain a gain function given by

*T*be an operator of a conditional expectation and \(V(\cdot )\) be the value function of the problem (cf. Shiryayev 1978). From general theory we know that

*V*satisfies a Bellman equation

### Theorem 1

The proof can be found in Bullen (2013) and here it is omitted.

### Lemma 1

*i*

### Proof

*x*and it is greater than 1 for all \(x<d_i\) it is sufficient to show that the inequality

*z*is the unique solution of the equation

## 3 The best choice problem with hint

### 3.1 The model

Suppose that except the decision maker (further denoted as DM) in the full-information best choice problem there is another player. However, he does not make a decision about stopping and choosing the best object since he has extra information about the best object, i.e. he knows exactly both position and the value of the current element. We will call him a prompter or a prophet (further denoted as PR). His aim is to sell this information in a proper moment and get for it as much as possible. PR must establish the price \(\alpha \) for the hint before the beginning of the game and he can sell his knowledge only once during the game. The decision maker can accept this proposition, pay a fixed price and get information whether the current object is the best one or not. He can also reject the purchase option and then stop or continue observations.

*k*and the actual value of the observed object

*x*, i.e. in the state \(\mathfrak {a}=(k,x)\). The payoff function for PR is written at the bottom of the graph.

*k*,

*x*), i.e. in a moment

*k*, we observe a locally maximal object \(X_k\) whose value is \(x, x\in (0,1)\). There are two possibilities: \(x<d_{n-k}\) and \(x\ge d_{n-k}\).

### Fact 1

Let \(x\le d_{n-k}\) for \(k\in \{1,\ldots ,n\}\). The function \(v_1(k,x)\) is an increasing function of *x*.

### Proof

Since *Tv*(*k*, *x*) is decreasing as *x* goes to the threshold \(d_{n-k}\) the whole function as a multiplication of increasing functions is increasing. \(\square \)

For the function \(v_2(i,x)\) we have

### Fact 2

Let \(x\ge d_{n-k}\) for \(k\in \{1,\ldots ,n\}\). Then, the function \(v_2(k,x)\) is a decreasing function of *x*.

### Proof

*x*and let \(i=n-k\). We obtain

*x*from the domain of the function \(v_2(i,x)\). From the description of the problem we know that

*x*. \(\square \)

*k*the maximal value of \(v_1\) is

### Lemma 2

*i*and

*z*is given by (13).

### Proof

Let us recall the principle of optimality. An optimal policy has the property that whatever the initial state and the initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision (cf. Bellman 1957). To find the exact form of the value function \(v_1(k,x)\) we can consider the remaining observations as the best choice problem for a random horizon. To be more specific, let us consider the following

### Lemma 3

*n*the current state of the process is (

*k*,

*x*) for some \(k \in \{ 1,2,\ldots ,n \}\) and \(x\in (0,1)\). Suppose that the process has not been stopped yet. Then, the optimal strategy is to stop at the first (if any) state \((k+m,u)\) such that \(0\le m\le n-k\) and \(u\ge u_{n-k-m}(x)\), where \(u_{n-k-m}(x)\) is the solution of the equation

### Proof

*x*and it is a relative maximum, we can truncate the further chain only in those observations that are greater than

*x*. The probability that observations in moments \(k+1, k+2, \ldots , n-k\) will be bigger than

*x*is \(1-x\). Therefore, from now, we consider a full information best choice problem with random horizon with observations from the uniform distribution on the interval (

*x*, 1). The horizon

*M*is binomially distributed, i.e.

*K*-times, then, the stopping region has no more than

*K*stopping islands. However here \(\{d(m,u)\}_{m=0}^{n-k}\) changes the sign at most one time. When

*k*is close to

*n*its value decreases to 0. So, the truncated problem is monotone and the optimal strategy is a threshold strategy. The thresholds \(u_{n-k-m}(x)\) can be calculated directly from

### Remark 1

*x*of the current object must be bigger than a threshold value. This threshold is \(u_{n-k}(x)\). It can be calculated from (25) for \(m=0\)

- 1.
Set the constant price \(\alpha \) during the whole game

- 2.
Set the vector of the prices depending on the moment of the game: \(\alpha =(\alpha _1,\ldots ,\alpha _n)\)

- 3.
Set the price function depending on the value of the current observation: \(\alpha =\alpha (x)\)

- 4.
Set the vector of the prices depending on the the moment of the game and the current value of the observed object: \(\alpha =(\alpha _1(x),\ldots ,\alpha _n(x))\)

### 3.2 \(\alpha = const\)

\(\alpha \ge 0.25\): then, the hint is not worth buying. The price is higher than its value.

- \(e^{-z}(1-e^{-z})< \alpha < 0.25\): then, the hint is worth buying for \(k=k^*,k^*+1,k^*+2,\ldots ,n\), where$$\begin{aligned} k^* = \min \left\{ 1\le k\le n : a_{n-k} \ge \alpha \right\} . \end{aligned}$$(30)
\(\alpha <e^{-z}(1-e^{-z})\): then, the hint is worth buying for \(k=1,2,\ldots ,n\) no matter how big is

*n*. Using the previous symbol we can say that in this case \(k^*=1\).

*k*we observe a relatively maximal element whose value is

*x*and it is worth buying a hint. Therefore, the probability of that event is given by

## 4 The best choice problem with extortion

### 4.1 The model

Suppose that we are in the state (*k*, *x*), i.e. in a moment *k* we observe an object \(X_k\), whose value is \(x, x\in (0,1)\). There are two possibilities: \(x<d_{n-k}\) and \(x\ge d_{n-k}\).

*D*is defined in (9). The equality holds for \(\alpha =0\). It implies that \(t_{n-k}(\alpha )\ge d_{n-k}\).

*D*earlier. The probability of that event is

### Theorem 2

### 4.2 The limiting values

## 5 Conclusion

In the world around us, despite the widespread access to information, there are still cases where certain information is obscured and not accessible. Access to them can be extremely valuable. This is not always possible, but there may be a kind of special occasion to buy. In such cases, the profitability of the purchase and the decision should be seriously considered. As a result of these considerations, the above models were created. The aim of the work was to construct a mathematical model describing the mechanism of obtaining additional information in various market situations. Usually, such information is secret, and the possibility of obtaining it is difficult. Hence, in the model, there is one prompter that has exclusive information. The model can be expanded. One of the possibilities is to introduce more than one decision maker to the game. Then, the prompter decides which player will offer the most. Another possibility is the appearance of more people wanting to sell information. In any case, the knowledge about each other must be considered. In this model, we have found the formula for the optimal price for the hint. It has been shown that the value of the hint has its limits. In the second model, the prompter behaves more like a ripper and blocks the ability to stop. Also here, you can extend the game with additional players. In the game above we have found an equilibrium price and the optimal strategy for the prompter. The limit for the tribute as the number of the observations goes to infinity has been derived.

## Notes

### Acknowledgements

The author would like to express his gratitude to professor Krzysztof Szajowski for his comments and discussions which helped to improve the quality of the paper.

## References

- Bellman R (1957) Dynamic programming, 1st edn. Princeton University Press, PrincetonzbMATHGoogle Scholar
- Bojdecki T (1978) On optimal stopping of a sequence of independent random variables probability maximizing approach. Stoch Process Appl 6(2):153–163. https://doi.org/10.1016/0304-4149(78)90057-1 MathSciNetCrossRefzbMATHGoogle Scholar
- Bullen P (2013) Handbook of means and their inequalities. Mathematics and its applications. Springer, Dordrecht. https://doi.org/10.1007/978-94-017-0399-4 CrossRefzbMATHGoogle Scholar
- Dotsenko SI, Marynych AV (2014) Hint, extortion, and guessing games in the best choice problem. Cybern Syst Anal 50(3):419–425. https://doi.org/10.1007/s10559-014-9630-8 MathSciNetCrossRefzbMATHGoogle Scholar
- Ferguson TS (1989) Who solved the secretary problem? Statist Sci 4(3):282–289. https://doi.org/10.1214/ss/1177012493 MathSciNetCrossRefzbMATHGoogle Scholar
- Gilbert JP, Mosteller F (1966) Recognizing the maximum of a sequence. J Am Stat Assoc 61(313):35–73MathSciNetCrossRefGoogle Scholar
- Gnedin AV (1996) On the full information best-choice problem. J Appl Probab 33(3):678–687MathSciNetCrossRefGoogle Scholar
- Gnedin AV, Miretskiy DI (2007) Winning rate in the full-information best-choice problem. J Appl Probab 44(2):560–565MathSciNetCrossRefGoogle Scholar
- Kuchta M (2017) Iterated full information secretary problem. Math Methods Oper Res 86(2):277–292. https://doi.org/10.1007/s00186-017-0594-0 MathSciNetCrossRefzbMATHGoogle Scholar
- Petruccelli JD (1982) Full-information best-choice problems with recall of observations and uncertainty of selection depending on the observation. Adv Appl Probab 14(2):340–358MathSciNetCrossRefGoogle Scholar
- Porosiński Z (1987) The full-information best choice problem with a random number of observations. Stoch Process Appl 24(2):293–307. https://doi.org/10.1016/0304-4149(87)90020-2 MathSciNetCrossRefzbMATHGoogle Scholar
- Porosinski Z (1992) The full-information best choice problem with two choices. In: Gritzmann P, Hettich R, Horst R, Sachs E (eds) Operations research ’91: extended abstracts of the 16th symposium on operations research held at the University of Trier at September 9–11, 1991, pp 278–281. Physica-Verlag HD, Heidelberg. https://doi.org/10.1007/978-3-642-48417-9_77 CrossRefGoogle Scholar
- Porosiński Z, Szajowski K (1996) On continuous-time two person full-information best choice problem with imperfect observation. Sankhyā Ser A 58(2):186–193MathSciNetzbMATHGoogle Scholar
- Sakaguchi M (1973) A note on the dowry problem. Rep Statist Appl Res Un Japan Sci Engrs 20(1):11–17MathSciNetzbMATHGoogle Scholar
- Sakaguchi M (1984) Best choice problems with full information and imperfect observation. Math Japon 29(2):241–250MathSciNetzbMATHGoogle Scholar
- Sakaguchi M, Szajowski K (1997) Single-level strategies for full-information best-choice problems. I Math Japon 45(3):483–495MathSciNetzbMATHGoogle Scholar
- Samuels SM (1982) Exact solutions for the full information best choice problem. Technical Report 82-17, Department of Statistics, Purdue UniversityGoogle Scholar
- Shiryayev AN (1978) Optimal stopping rules. Springer, New YorkGoogle Scholar
- Tamaki M (1986) A full-information best-choice problem with finite memory. J Appl Probab 23(3):718–735MathSciNetCrossRefGoogle Scholar

## Copyright information

**Open Access**This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.