A prophet inequality for L p -bounded dependent random variables

Let X = ( X n ) n ≥ 1 be a sequence of arbitrarily dependent nonnegative random variables satisfying the boundedness condition

We start with the necessary background and notation. Assume that X = (X n ) n≥1 is a sequence of (possibly dependent) random variables defined on the probability space ( , F, P). With no loss of generality, we may assume that this probability space is the interval [0, 1] equipped with its Borel subsets and Lebesgue measure. Let (F n ) n≥1 = (σ (X 1 , X 2 , . . . , X n )) n≥1 be the natural filtration of X . The problem can be generally stated as follows: under some boundedness condition on X , find universal inequalities which compare M = E sup n X n , the expected supremum of the sequence, with V = sup τ EX τ , the optimal stopping value of the sequence; here τ runs over the class T of all finite stopping times adapted to (F n ) n≥1 . The term "prophet inequality" arises from the optimal-stopping interpretation of M, which is the optimal expected return of a player endowed with complete foresight; this player observes the sequence X and may stop whenever he wants, incurring a reward equal to the variable at the time of stopping. With complete foresight, such a player obviously stops always when the largest value is observed, and on the average, his reward is equal to M. On the other hand, the quantity V corresponds to the optimal return of the non-prophet player.
Let us mention here several classical results in this direction; for an excellent exposition on the subject, we refer the interested reader to the survey by Hill and Kertz [12]. The first universal prophet inequality is due to Krengel, Sucheston and Garling [17,18]: if X 1 , X 2 , . . . are independent and nonnegative, then M ≤ 2V and the constant 2 is the best possible. The next result, coming from [8] and [10], states that if X 1 , X 2 , . . . are independent and take values in [0, 1], then Both estimates are sharp: equalities may hold for some non-trivial sequences X . Analogous inequalities for other types of variables X 1 , X 2 , . . . (e.g., arbitrarily-dependent and uniformly bounded, i.i.d., averages of independent r.v.'s, exchangeable r.v.'s, etc.), as well as for other stopping options (for instance, stopping with partial recall, stopping several times, using only threshold stopping rules, etc.) have been studied intensively in the literature and have found many interesting applications. We refer the reader to the papers cited at the beginning. The motivation for the results obtained in this paper comes from the following statement proved by Hill and Kertz [11]: if X 1 , X 2 , . . . are arbitrarily dependent and take values in [0, 1], then we have the sharp bound There is a very natural problem concerning the L p -version of this result, where p is a fixed number between 1 and infinity. For example, consider the following interesting question. Suppose that X 1 , X 2 , . . . are nonnegative random variables satisfying sup n EX p n ≤ 1. What is the analogue of (1.1)? Unfortunately, as we will see in Sect. 5 below, there is no non-trivial prophet inequality in this setting. More precisely, for any K > 0 one can construct a sequence X bounded in L p which satisfies V = 1 and M ≥ K .
We will work under the more restrictive assumption where t is a given positive number. For instance, this holds true, if the sequence X possesses a majorant ξ which satisfies Eξ p ≤ t.
The main result of the paper is the following.
and the inequality is sharp: the bound on the right cannot be replaced by a smaller number.
Note that this statement generalizes the inequality (1.1) of Hill and Kertz: it suffices to take t = 1 and let p go to ∞ to recover the bound. On the other hand, the expression on the right of (1.3) explodes as p ↓ 1, which indicates that there is no prophet inequality in the limit case p = 1.
A few words about the proof. Our approach is based on the following two-step procedure: first we show that it suffices to establish (1.3) under the additional assumption that X is a nonnegative supermartingale; second, we prove that in the supermartingale setting, the validity of (1.3) is equivalent to the existence of a certain special function which enjoys appropriate majorization and convexity properties. In the literature this equivalence is often referred to as Burkholder's method or Bellman function method, and it has turned out to be extremely efficient in numerous problems in probability and analysis: consult e.g. [7,20,21,26] and references therein.
We have organized the paper as follows. In the next section we reduce the problem to the supermartingale setting. Section 3 contains the description of Burkholder's method (or rather its variant which is needed in the study of (1.3) for supermartingales). In Sect. 4 we apply the method and provide the proof of Theorem 1.1. In the final part of the paper we show that there are no interesting prophet inequalities in the case when the variables X 1 , X 2 , . . . are only assumed to be bounded in L p .

A reduction
In this section we show how to relate (1.3) to a certain inequality for nonnegative supermartingales. Throughout, we use the notation X * = sup n≥1 X n and X * m = sup 1≤n≤m X n for the maximal and the truncated maximal function of X . Recall that V = sup τ ∈T EX τ . We start with the observation that it is enough to deal with finite sequences only (in this paper we say that X is finite, if it is of the form (X 1 , X 2 , . . . , X N −1 , X N , X N , X N , . . .) for some deterministic N ). This is straightforward: suppose we have successfully established the prophet inequality in this special case, and pick an arbitrary, possibly non-finite X . Then for any fixed N , the truncated sequence X N = (X 1 , X 2 , . . . , X N −1 , X N , X N , X N , . . .) is finite, inherits (1.2) and its optimal expected return does not exceed V . Since the function It remains to let N → ∞ and use Lebesgue's monotone convergence theorem.

Lemma 2.1
Suppose that X = (X n ) n≥1 is an arbitrarily dependent finite sequence of random variables satisfying X 1 ≡ 0 and (1.2). Then there is a finite supermartingale Y = (Y n ) n≥1 adapted to the filtration of X , which satisfies Y n ≥ X n almost surely for all n, Note that the additional assumption X 1 ≡ 0 is not restrictive at all: we can always replace the initial sequence X 1 , X 2 , . . . with 0, X 1 , X 2 , . . ., and the prophet inequality remains the same. In the proof of the above lemma we will need the notion of essential supremum, a well-known object in the optimal stopping theory. Let us briefly recall its definition, for details and properties we refer the reader to the monographs of Peskir and Shiryaev [22] and Shiryaev [23].

Definition 2.1
Let (Z α ) α∈I be a family of random variables. Then there is a countable subset J of I such that the random variable Z = sup α∈J Z α satisfies the following two properties: The random variable Z is called the essential supremum of (Z α ) α∈I .
Proof of Lemma 2. 1 We will use some basic facts from the optimal stopping theory; for details, we refer the reader to Chapter I in Peskir and Shiryaev [22]. Suppose that X = (X 1 , X 2 , . . . , X N , X N , X N , . . .) is a finite sequence and let Y be the Snell envelope of X , i.e., the smallest adapted supermartingale majorizing the sequence (X n ) n≥1 . It is a well-known fact that for each n the variable Y n is given by the formula where T n denotes the class of all finite adapted stopping times not smaller than n. Since σ ≡ n belongs to T n , the inequality (2.1) is given for free. To show (2.2), observe that Y 1 is a constant random variable, since the σ -algebra F 1 = σ (X 1 ) is trivial. Thus the bound Y 1 ≥ V follows directly from the above formula for Y 1 and the definition of an essential supremum. On the other hand, for any finite stopping time σ we have V ≥ EX σ = E(X σ |F 1 ) almost surely, which implies P(V ≥ Y 1 ) = 1, again from the definition of an essential supremum. This gives (2.2). Since the sequence X stabilizes after N steps, so does Y and therefore the second estimate in (2.3) holds true, directly from (2.2) and the supermartingale property. To prove the first bound in (2.3), recall that Y can be alternatively defined by the backward induction This implies that Y n = E(X τ n |F n ), where the stopping time τ n is given by Thus, if we fix an arbitrary τ ∈ T , then We easily check that σ is a stopping time: for any 1 ≤ k ≤ N , the event belongs to F k . Therefore, the boundedness assumption (1.2) implies EY p τ ≤ t, as desired.
Therefore, it suffices to establish the inequality (1.3) under the additional assumption that the process X is a finite supermartingale and the variable X 1 is constant almost surely. By some standard approximation arguments, we may further restrict ourselves to the class of simple supermartingales; recall that the sequence X = (X n ) n≥1 is called simple, if for each n the random variable X n takes only a finite number of values. We are ready to apply Burkholder's method, which is introduced in the next section.

Burkholder's method
Now we will describe the main tool which will be used to establish the inequality (1.3). Distinguish the set where the supremum is taken over all stopping times adapted to the natural filtration of X . Suppose that we are interested in the explicit formula for the function where the supremum is taken over all positive integers n and all X ∈ S(x, y, t).
The key idea in the study of this problem is to introduce the class C which consists of all functions B : D → R satisfying B(x, y, t) ≥ y for any (x, y, t) ∈ D, (3.1) and the following concavity-type property: We turn to the main result of this section. Recall that the probability space is the interval [0, 1] with its Borel subsets and Lebesgue measure. Proof It is convenient to split the reasoning into two parts.
Step 1. First we will show that B belongs to the class C. The majorization (3.1) is immediate, since X * n ∨ y ≥ y. The main difficulty lies in proving the concavity property (3.2). Fix the parameters α, x, t, x ± , t ± as in the statement and pick arbitrary supermartingales X − ∈ S(x − , y, t − ), X + ∈ S(x + , y, t + ). We splice these two processes into one sequence X = (X n ) n≥1 by setting X 1 ≡ x and, for n ≥ 2, Then X is a nonnegative supermartingale (with respect to its natural filtration), because the processes X ± have this property and αx − + (1 − α)x + ≤ x. Furthermore, for any stopping time τ of X we have EX p τ ≤ t. To see this, we consider two cases. 1. If P(τ = 1) > 0, then the set {τ ≤ 1} is nonempty; combining this with the facts that X 1 is constant and τ is a stopping time of X , we see that {τ ≤ 1} = , or τ ≡ 1. Then EX p τ = x p ≤ t by the definition of D. 2. Suppose that {τ = 1} = ∅, or τ ≥ 2 almost surely. Then we easily verify that the variables τ ± , given by are stopping times of X − and X + . Therefore, Hence X ∈ S(x, y, t). Since x ≤ y, we have X * n ∨ y = sup 2≤k≤n X k ∨ y and thus Take the supremum over all n and X ± as above to obtain the desired bound (3.2).
Step 2. Now suppose that B is an arbitrary element of C; we must prove that B ≤ B.
To do this, rephrase the condition (3.2) as follows. Suppose that (X, T ) is an arbitrary random variable with two-point distribution, such that P(X p ≤ T ) = 1. Then for any (x, y, t) ∈ D such that EX ≤ x and ET ≤ t, we have

B(x, y, t) ≥ EB(X, y, T ). (3.3)
Note that the set {(x, t) : x p ≤ t} is convex. Therefore, by straightforward induction, the above inequality extends to the case when (X, T ) is an arbitrary simple random variable satisfying X p ≤ T with probability 1. Now, pick X ∈ S(x, y, t) and consider the sequence (X, Y, T ), where Y n = X * n ∨ y and T n = ess sup τ ∈T n E X p τ |F n (here T n denotes the class of all stopping times of X not smaller than n). Then the process B(X, Y, T ) is a supermartingale: to see this, fix n ≥ 1 and apply (3.3) conditionally with respect to F n , withx := X n ,ỹ = Y n ,t = T n ,X = X n+1 andT = T n+1 . Let us verify the assumptions: the inequalitiesx p ≤t,x ≤ỹ andX p ≤T are evident; the inequalities E(X |F n ) ≤x and E(T |F n ) ≤t follow from the supermartingale property of X and T (T is a supermartingale, since it is the Snell envelope of the sequence (X p n ) n≥1 ). Thus, (3.3) yields Combining this with (3.1) yields

y, t).
Here in the last inequality we have used the fact that the function t → B(x, y, t) is nondecreasing; this follows immediately from (3.2), applied to x + = x − = x and t + = t − < t. Taking the supremum over all n and all X ∈ S(x, y, t), we obtain the bound B(x, y, t) ≤ B(x, y, t). This finishes the proof.

Proof of Theorem 1.1
We will prove that the function B admits the following explicit formula: This will clearly yield the assertion of Theorem 1.1, which is nothing else but the explicit formula for B(V, V, t). Denote expression on the right above by B(x, y, t).

Proof of B ≤ B
In the light of Theorem 3.1, it suffices to verify that B ∈ C. The condition (3.1) is obvious, and the main problem is to establish (3.2). First, we easily check that the functions x → B(x, x ∨ y, t) and t → B(x, y, t) are nondecreasing. Consequently, in the proof of (3.2) we may restrict ourselves to the case x = αx − + (1 − α)x + and t = αt − + (1 − α)t + . Since the region {(x, t) : x p ≤ t} is convex, it is enough to prove the following. For any h ∈ R and any (x, y, t) ∈ D, the function ϕ(s) = B(x + sh, (x + sh) ∨ y, t + s) (defined for those s, for which (x + sh, (x + sh) ∨ y, t + s) ∈ D) is concave. We start from observing that ϕ is of class C 1 ; this follows immediately from the fact that B is of class C 1 and B y (x, x, t) = 0 (the latter condition guarantees that the one-sided derivatives ϕ (s−) and ϕ (s+) will match for x + sh = y). To deal with the concavity of ϕ on the set {s : x + sh ≤ y}, we must prove that the matrix B x x (x + sh, y, t + s) B xt (x + sh, y, t + s) B xt (x + sh, y, t + s) B tt (x + sh, y, t + s) (defined for y = (t + s)/(x + sh)) is nonpositive-definite. Substitutingx = x + sh andt = t + s if necessary, we may assume that s = 0. Now, if y < (t/x) 1/( p−1) , then the matrix equals which clearly has the required property; if y > (t/x) 1/( p−1) , the situation is even simpler, since all the entries of the matrix are 0. Finally, it remains to show the concavity of ϕ on {s : A direct differentiation yields ≤ 0 and the claim follows.

Proof of B ≥ B
Now we will use the second half of Theorem 3.1, which states that B ∈ C. We will also exploit the following additional homogeneity property of B: for any (x, y, t) ∈ D and λ > 0 we have B(λx, λy, λ p t) = λB(x, y, t). (4.1) This condition follows at once from the very definition of B and the fact that X ∈ S(x, y, t) if and only if λX ∈ S(λx, λy, λ p t).
For the sake of clarity, we have split the reasoning into a few parts.
Step 1. Let δ be a small positive number. Using (3.2), we can write Thus, using (3.1) and (4.1), the right-hand side is not smaller than (4.3) Combining the above facts, we get Step 2. Now we provide a lower bound for B (1, 1, t), where t is larger than 1. We argue as in the previous step, applying (3.2) and combining it with (3.1) and (4.1). Precisely, we fix a small positive δ and write (4.5) By induction, this implies if only (1 + δ) n( p−1) ≤ t. Now we fix a large positive integer n, put δ = t 1/(n( p−1)) − 1 (so that (1 + δ) n( p−1) = t) and let n → ∞. Then the above bound gives which combined with (4.4) yields Step 3. The next move in our analysis is to prove the estimate B(x, y, t) ≥ B(x, y, t) for y ≤ (t/x) 1/( p−1) . We proceed as previously: first apply (3.2) to obtain (here we use the assumption y ≤ (t/x) 1/( p−1) ; if this inequality does not hold, the point (y, y, t y/x) does not belong to D and B(y, y, t y/x) does not make sense). Next, using (3.1), (4.1) and finally (4.6), we get Step 4. Now we will deal with B(1, y, 1) for y > 1. By Combining this with (4.4), we obtain Step 5. This is the final part. Pick (x, y, t) ∈ D such that y > (t/x) 1/( p−1) and apply (3.2) and then (3.1) to get 1) . For this choice of α, we have x p = t (1 − α) p−1 and hence, by (4 .1) and (4.7), This completes the proof of the inequality B ≥ B on the whole domain. Thus, Theorem 1.1 follows.

On the construction of extremal examples
The arguments presented in Steps 1-5 can be easily translated into the construction of extremal supermartingales X ("extremizers") corresponding to B(x, y, t), i.e., those for which the supremum defining B(x, y, t) is almost attained. The purpose of this section is to explain how to extract this construction from the above calculations. The reasoning will be a little informal, as our aim is to present the idea of the connection. First we look at the value B(1, 1, 1), which was studied in Step 1. What about the extremal X ∈ S(1, 1, 1)? For a fixed δ > 0, consider a Markov process ((X n , Y n , T n )) n≥0 starting from (1, 1, 1), satisfying the following two requirements.
Then one can check that the process X is a supermartingale. If we stop it after a large number N of steps, we obtain the desired extremizer: that is, if we take sufficiently small δ and sufficiently large N , then EX * N can be made arbitrarily close to B(1, 1, 1). One might wonder why we have introduced the more complicated three-dimensional process (X, Y, T ) above. The reason is that the moves described in (i) and (ii) are closely related to the inequality (4.2) and the bound (4.3) on which Step 1 rests. To explain this more precisely, note first that (4.2) encodes the Markov move from (i): to make this more apparent, combine (4.2) with (4.1) to get Thus, a starting state appears on the left, while the destination states can be found on the right, with the appropriate transition probabilities constituting the appropriate weights. The condition (ii) is connected to the bound (4.3): generally speaking, all the states (x, y, t) at which we use the majorization B(x, y, t) ≥ y in the above considerations, need to be assumed absorbing.
Furthermore, in the second inequality of (4.5), we use the bound B(0, λ, 0) ≥ λ; this suggests that the requirement (ii) above should be valid. Now, suppose that δ is chosen appropriately as in Step 2: δ = t 1/(n( p−1)) − 1 for some large positive integer n. Then, after n steps, the process (X, Y, T ) gets to the point ((1+δ) n , (1+δ) n , (1+δ) np ) with positive probability. Now the procedure described in (i') does not apply since the point is not of appropriate form. In Step 2 we encountered a similar phenomenon: the number B(1, 1, 1) came into play and the arguments of Step 2 did not apply. To overcome this difficulty, we exploited Step 1. Here we do the same and apply the procedure (i) to the point ((1 + δ) n , (1 + δ) n , (1 + δ) np ). In other words, the Markov process (X, Y, T ) is given by the starting position (1, 1, t) and the conditions (i), (ii) and (i'). It remains to stop the process X after a large number of steps to obtain the extremizer corresponding to B (1, 1, t).
The remaining extremal processes, corresponding to the values of B at remaining points, are found similarly. We leave the details to the reader.

Lack of prophet inequalities for L p bounded variables
In the final part of the paper we show that if the condition (1.2) is replaced by Consequently, by the second estimate in (5.2), the sequence X satisfies EX * ≥ sup τ ∈T EX τ + K .
Since K was arbitrary, no universal prophet inequality holds under (5.1). This completes the proof.