Asymptotic Expansions and Strategies in the Online Increasing Subsequence Problem

We study two closely related problems in the online selection of increasing subsequence. In the first problem, introduced by Samuels and Steele (Ann. Probab. 9(6):937–947, 1981), the objective is to maximise the length of a subsequence selected by a nonanticipating strategy from a random sample of given size n\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$n$\end{document}. In the dual problem, recently studied by Arlotto et al. (Random Struct. Algorithms 49:235–252, 2016), the objective is to minimise the expected time needed to choose an increasing subsequence of given length k\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$k$\end{document} from a sequence of infinite length. Developing a method based on the monotonicity of the dynamic programming equation, we derive the two-term asymptotic expansions for the optimal values, with O(1)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$O(1)$\end{document} remainder in the first problem and O(k)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$O(k)$\end{document} in the second. Settling a conjecture in Arlotto et al. (Random Struct. Algorithms 52:41–53, 2018), we also design selection strategies to achieve optimality within these bounds, that are, in a sense, best possible.


Introduction
The online increasing subsequence problems are stochastic optimisation problems concerned with non-anticipating policies aimed to select an increasing subsequence from a sequence of random items X 1 , X 2 , . . . with known continuous distribution F , which, without loss of generality, will be assumed uniform on [0, 1]. The online constraint requires to accept or reject X i at time i, when the item is observed, with the decision becoming immediately terminal. Formally, an online policy for selecting an increasing subsequence is a collection of stopping times τ = (τ 1 , τ 2 , . . .) adapted to the sequence of sigma-fields F i = σ {X 1 , X 2 , . . . , X i }, 1 ≤ i < ∞, and satisfying (i) τ 1 < τ 2 < · · · , (ii) X τ 1 < X τ 2 < · · · .
A. Seksenbayev a.seksenbayev@qmul.ac.uk We denote the space of all online policies by T .
In the first problem, introduced by Samuels and Steele [20], the objective is to maximise the expected length of an increasing subsequence selected from the first n items. The performance of an online policy is measured by EL n (τ ), where L n (τ ) := max{j : 1 ≤ τ 1 ≤ · · · ≤ τ j ≤ n such that X τ 1 ≤ X τ 2 ≤ · · · ≤ X τ j } is the length of the selected increasing subsequence. Samuels and Steele proved that the maximal expected length, v n = sup τ ∈T EL n (τ ), satisfies v n ∼ √ 2n, as n → ∞.
It has been later observed that the Samuels-Steele problem is equivalent to a special case of online bin-packing problem [11,18] with uniformly distributed weights being packed into a unit-sized bin. This connection was useful to show that √ 2n is, in fact, the upper bound [9] (also see [13] for an alternative approach).
For comparison, a clairvoyant decision-maker with a complete overview of the data could choose the longest increasing subsequence. The study of the statistical properties of its length l(n) is known as the Ulam-Hammersley problem. After years of exciting development Baik et al. [5] showed that El(n) = 2 √ n + cn 1/6 + o(n 1/6 ), n → ∞, with c = −1.758 . . ., and proved that the distribution of (l(n) − 2 √ n)/n 1/6 converges to Tracy-Widom distribution from random matrix theory. See Romik [19] for an excellent account.
To prove the existence of the limit v n / √ n, Samuels and Steele also introduced analogous problem with arrivals by Poisson process, and the objective to complete selections within given time horizon [0, n]. The connection with poissonised version has been useful to obtain asymptotic results in the fixed-n setting. Arlotto et al. [2] proved the central limit result analogous to Bruss-Delbaen [8] result in poissonised setting: where τ * is the optimal selection policy. Tightest known bounds on v n are √ 2n − 2 log n − 2 ≤ v n < √ 2n, n → ∞.
The lower bound was shown recently in [4]. By assessing a suboptimal selection policy with an acceptance window which depends on both the size of the last selection and the number of items yet to be observed, Arlotto et al. suggested that the optimality gap in (2) can be further tightened. They also obtained numerical evidence that the performance of the employed policy is within O(1) from the optimum.
In a dual problem, studied recently by Arlotto et al. [3], the objective is to minimise Eτ k , the expected time to select an increasing subsequence of fixed length k from an infinitely long series of observations. For the optimal expected time

Arlotto et al. proved the bounds
The bounds were obtained by an analytical investigation of the optimality recursion.
The quickest selection problem of Arlotto et al. [3] is equivalent to a special case of the sum constraint problem of Chen et al. [10] (see Example 2 on p. 541 for the k 2 /2 asymptotics and Mallows et al. [15] for a multidimensional extension). So the principal asymptotics k 2 /2 can be read off from this earlier work. Coffman et al. [11] in Sect. 6 showed that the same asymptotics k 2 /2 also occur in the offline quickest selection problem.
In the present paper we adapt an asymptotic comparison method used before in the poissonised setting [6,14] to approximate solutions to the optimality equations. In fact, this method can also be used to estimate the performance of a certain class of suboptimal policies too. We refine the cited results as follows. For the longest increasing subsequence problem, we prove that A similar expansion with the second term (log n)/6 was obtained in the related problem of online selection from a random permutation of n integers by Peng and Steele [17]. The difference in logarithmic terms can be interpreted as an advantage of a better-informed decision-maker, who knows the values of order statistics of {X 1 , . . . , X n } but not the succession in which the items are revealed in the course of observation. Furthermore, we settle the conjecture from [4] by showing that the performance of the policy used there is indeed within O(1) from the maximum v n .
For the quickest selection problem, we prove that Given the natural duality of the two problems, one may suspect that the functions n → v n and k → β k are asymptotic inverses of one another. In the principal asymptotics, this was obvious from (1) and (3). However, the refined expansions (4) and (5) show a more intimate connection between the problems: inverting v n gives a two-term expansion of β n . For the quickest selection problem, we also introduce selection policies that are asymptotically optimal. The first is a variation of the constant window policy, resembling the one from [20] to achieve the principal asymptotics k 2 /2. The second is a more complex policy, with optimality gap O(k), which is in fact the best one can do asymptotically, without concern about small k.

The Longest Subsequence Problem
In the Samuels-Steele problem, n is the model parameter representing the total size of a random sample. After the first item is observed, the problem reduces down to a selection out of n − 1 random items. Furthermore, if the first item of size z, z ∈ [0, 1] was selected, the future observations that fall below z are to be automatically discarded. We denote by v m (z), m = 1, . . . , n the maximal expected length of an increasing subsequence, when the remaining sample size is m, and the last selected item is of size z. The functions v m : [0, 1] → R + are called the optimal value functions in the longest subsequence selection.
In addition, we define a subclass of online selection policies that have a variable acceptance window. That is, at each step of selection, there is a threshold function h m : [0, 1] → [0, 1], 0 ≤ h m (z) ≤ 1 − z, that shapes the decision of whether to accept or reject current observation: the corresponding policy accepts the observation of size x if and only if it falls into the acceptance window [z, z + h m (z)]. Setting τ 0 = 0 and X τ 0 = 0, this policy corresponds to a sequence of stopping times τ = (τ 1 , τ 2 , . . . , τ j ), where with the convention that min ∅ = ∞.

Optimality Equation for the Value Function
The optimality equation is a well-known recursion [2,4,20] with v 0 (z) = 0. Note that v n (z) + c also satisfies (6) for any constant c. We provide here an intuition behind the optimality equation (6). Assume we are at the selection stage with n + 1 observations to inspect and the last selection size is z; this corresponds to the value function v n+1 (z) on the left-hand side. With probability z the next observation is below z, leaving us with the expected length v n (z), which explains the first term on the righthand side. Should the next observation x be admissible, the dynamic programming principle prescribes us to choose x if and only if v n (x) + 1 ≥ v n (z). Hence, the optimal decision provides max{v n (x) + 1, v n (z)}. Averaging over the uniformly distributed x gives (6). Observe that v n (z) is the maximum expected length of an increasing subsequence chosen from N items, with N d = Bin(n, 1 − z) (see [13], p. 945, and [20] if v n (z) > 1, and h n (z) = 1 − z otherwise. Then, the monotonicity of v n (z) in z implies that the integrand in (6) is equal to v n (x) + 1 on the interval [z, z + h n (z)]. On the remaining interval [z + h n (z), 1] the integrand assumes value v n (z). This provides the form of the optimal selection policy: accept the observation x, if it falls into the acceptance window [z, z + h n (z)]. From (7) it can be seen that the acceptance window is updated dynamically with every observation. Thus, the optimal policy indeed belongs to the class of policies with a variable acceptance window, and we call functions h n (z) the optimal threshold functions. Note, the equation (7) has a solution only when v n (z) > 1. This has a logical interpretation: when v n (z) ≤ 1, the decision-maker should select every successive record, as this provides the largest expected payoff.
In the sequel we work directly with the optimality equation to refine the asymptotics of the value functions. The comparison method we employ hinges on certain monotonicity properties of (6). Let an operator G n acting on continuous bounded functions f : [0, 1] → R + possess the following properties To determine the range of limit regimes for (n, z), we introduce a size function g n (z) := n(1 − z). We say that sequence (f n ) is locally bounded from above if sup (n,z):gn(z)≤c f (z) < ∞, for every c > 0, locally bounded from below if (−f n ) is locally bounded from above, and locally bounded if (|f n |) is locally bounded from above. With this in mind, we present the following key lemma.
If (f n ) is locally bounded from above, while a sequence ( f n ) is locally bounded from below and f n+1 ≥ G n+1 ( f n )(z) when g n (z) is sufficiently large, then the difference f n (z) − f n (z) is bounded from the above uniformly for all n and z.
Similarly, if (f n ) is locally bounded from below, while ( f n ) is locally bounded from above and f n+1 ≤ G n+1 ( f n )(z) when g n (z) is sufficiently large, then f n (z) − f n (z) is bounded from the below uniformly for all n and z.
Proof Adding a constant if necessary and using the shift-invariance property (i) of the operator G n , we can reduce to the case f n (z) > 0.
If the first claim is not true, then for every constant c > 0 we can find n 0 , z 0 such that Observe that Therefore, from the local boundedness of (f n ), we have n 0 (c) → ∞ as c → ∞.
Moreover, since (f n ) is locally bounded from above, we have g n 0 +1 (z) ≥ f n 0 +1 (z 0 ) > c, so we can choose c large enough to achieve G n 0 +1 ( f n 0 )(z 0 ) < f n 0 +1 (z 0 ). Therefore, appealing to the shift-invariance (i), we obtain However, by the choice of n 0 , we have f n 0 (z 0 ) < f n 0 (z 0 ) + c. Thus, from the monotonicity property (ii), it follows that which is a contradiction. Now, to prove the second part of the lemma, assume to the contrary that the difference f n (z) − f n (z) is unbounded from the below. Then, for every constant c > 0 one can find n 1 , From Moreover, choosing c large enough, we can achieve However, this contradicts which follows from (8). This concludes our proof.

Asymptotic Expansion of v n
We utilise Lemma 1 to compare v n (z) with a sequence of carefully chosen test functions.
With each iteration of the method, we obtain a finer asymptotic expansion of v n (z). Since we do not employ the initial condition v n (0) = 0, the maximal accuracy of the comparison method is bound by the O(1)-term. Having obtained the desired expansion of v n (z), we specialise it to the case with z = 0, thus deriving (4). The whole procedure is reminiscent of a familiar method of successive approximations of the solution to the differential equations (see, for example, [12], Sect. 9.1).
Introduce operators and Ψ acting as With this notation, the optimality equation (6) assumes the form yields the following result.
is bounded from the above uniformly in n and z; likewise, if v n (z) < Ψ v n (z), then v n (z) − v n (z) is bounded from the below uniformly for all n and z.
To obtain the principal asymptotics consider the test function where γ 0 > 0 is a parameter. Introducing for convenience n := n(1 − z) and expanding for large n we obtain Furthermore, using the change of variable y : where h (0) n (z) is the solution to We have Using Taylor expansion of the integrand in y around 0 yields hence, integrating and using (10) yields The match between (9) and (11) occurs for γ 0 = √ 2. Therefore, we have, for Applying Lemma 1 we see that lim sup A parallel argument with γ 0 < √ 2 yields lim inf Combining (12) with (13), we obtain For a better approximation we consider the test functions Using Taylor expansion with a remainder yields v (1) On the other hand, using substitution For n → ∞, Actually, we only need the first term of (17) to obtain the expansion of Ψ v (1) n (z) up to the desired order. This is down to the fact that order O( n −1 )-term in (17) contributes only (1) n (z). Indeed, keeping n as a parameter, let us view the integral on the third line of (15) as a function of its upper limit In view of (16), h 1 := h (1) n (z) is a stationary point of the integrand. Expanding at h 1 with a remainder we get, for some ζ ∈ [0, 1] Now letting n → ∞ and ε = O( n −1 ) we obtain as claimed.
In light of this, integrating and expanding we obtain Expansions (14) and (18) We need one more iteration to bound the remainder. Consider the test functions v (2) For n → ∞, we obtain the expansion uniformly in z ∈ [0, 1), and with some more effort for the integral Since Appealing to (19), (20) and the first inequality in (21), we conclude that, for large (2) n (z) for such γ 2 is bounded from above. On the other hand, exploiting the second inequality in (21), we derive that for large n(1 − z) v (2) n (z) < Ψ v (2) thus, by the asymptotic dominance lemma v n (z)−v (2) n (z) for such γ 2 is bounded from below. However, since the last term of v (2) n (z) is already bounded, it follows readily that Our main result is the special case of (22) with z = 0.

Asymptotically Optimal Policies
Recall our definition of a policy with a variable acceptance window. For m ∈ N, m ≤ n, let h m : [0, 1] → [0, 1] be threshold functions which define a policy via the acceptance window where m is the number of remaining observations. The threshold functions h m (z) depend on both the size of the last selected item and the number of remaining observations. As was shown in the previous section, when v m (z) > 1, the optimal policy has threshold functions However, there are good policies that can be defined more simply. For example, the stationary policy of Samuels and Steele [20] has constant threshold functions independent of the remaining sample size. It accepts every observation that exceeds the last selection by no more than √ 2/n. Setting threshold functions h m (z) := min{ √ 2/n, 1 − z} for all m = 1, 2, . . . , n describes this strategy completely. Remarkably, this uncomplicated policy achieves asymptotic optimality up to the leading order term of the expected performance. The intuition behind the choice of this threshold function lies in the derivation of the familiar mean-constraint bound on v n (see, for example, [9]).
A more sophisticated policy was introduced by Arlotto et al. [4]. In contrast to the stationary policy of Samuels and Steele, the acceptance window here is variable. The acceptance criterion for this policy is Observe, that normalising (24) leads to the acceptance condition and successively working out the values of α 0 , α 1 , α 2 leads to the same asymptotic expansion as in (22) Taken together with (22) this settles the conjecture in [4].

Theorem 2
The policy with threshold functions (24) has the expected performance v n = v n (0) satisfying

The Quickest Selection Problem
We now turn to study the quickest selection problem introduced in [3]. In contrast to the original problem, they asked what is the minimum expected time β k to choose a k-long increasing subsequence from an infinite sequence of random variables in an online fashion? Recall that, formally,

Optimality Recursion in the Quickest Selection Problem
The quickest selection problem is a decision problem with infinite horizon. Still, a version of the comparison method turns useful here too. The value function in the quickest selection problem β k (z) depends on the running maximum z and the number of selections yet to be made. The first step decomposition yields the dynamic programming equation The marks above z occur with Geom(1 − z) interarrival times, hence where β k := β k (0). Substituting this into the dynamic programming equation and substituting x → (1 − z) x + z we arrive at

Lemma 2
The optimal value function β k satisfies the implicit recursion initialised with β 1 = 1.
The optimal strategy amounts to the following rule. If at stage j some k items are yet to be chosen, the last selection was z, and the observed item is X j = x then the item should be selected if and only if For arbitrary h k , 0 ≤ h k ≤ 1 − z, we call a strategy defined in that way self-similar.

Arlotto et al. derived (3) by analysing the optimality recursion in the following form ([3], Lemma 3)
The recursion (27) is, in fact, equivalent to (28). Indeed, we find that the minimising value of t = t * satisfies Substituting (29) into (28) yields the optimal solution t * = 1 − β k /β k+1 . Plugging the optimal solution into (28) and rearranging gives (27). Arlotto et al. also proved that the function k → β k is convex and showed the optimal policy to be self-similar with the optimal threshold functions satisfying

Preparation for the Asymptotic Analysis
Recursion (27) possesses several useful analytical properties. To begin with, define a function G : In terms of G, (27) becomes Recursion (31) taken together with the condition 1 = β 1 < β 2 < · · · defines the sequence β k uniquely, as seen from the next lemma.
For x ≥ 0 define g(x) as a solution to G(x, g(x)) = 1.
This function g(x) has two branches, and we are interested in the upper branch.

Lemma 3 The function g has a branch that lies entirely in the domain
Proof Calculating the partial derivatives we see that, if G(x 0 , y 0 ) = 1, then, by the Implicit Function Theorem, in the vicinity of (x 0 , y 0 ) there is a uniquely defined function g(x) with If, furthermore, 0 < x 0 < y 0 , then this function has derivative g (x) > 1, since where z = x/y. Thus, if there is one such point (x 0 , y 0 ) ∈ D, then there is a branch g(x) : In particular, we can pick (x 0 , y 0 ) = (1, y 0 ), where y 0 = 3.146 · · · solves y − log y = 2.
Note that g(0+) = 1, but g (0+) = ∞. From now on we only consider the branch of g(x) defined in Lemma 3. In these terms That is, the sequence of optimal values β k is obtained as iterations of g, starting with β 1 = 1. So β 2 = g(1), β 3 = g(g(1)), etc. We wish to find now the asymptotic behaviour of g for large x.

Lemma 4 Function g(x) possesses the following asymptotic expansion
Proof Dividing both sides of G(x, y) = 1 by x yields

Performing a change of variables
we arrive at Because lim sup y/x < ∞ and a = 1 is the unique solution to a = 1 + log a, we may conclude that y x → 1.
In light of this, we may investigate (33) in the vicinity of z = w = 0. The function w(z) is analytic within a unit circle. Thus, expanding logarithm yields a series representation of w(z) Since w (0) = 0, the inverse function has an algebraic branch point at 0 of order 1 (see [16] for definition). The inverse z(w) is representable as Puiseux series in powers of w 1/2 , with coefficients that can be calculated recursively. From the first two terms of series (34) we obtain Plugging z(w) = √ 2w 1/2 + a 0 w + o(w), where a 0 is a constant coefficient, into (34) yields which provides us with a refinement Another iteration of the method with z(w) = √ 2w 1/2 + 2 3 w + a 1 w 3/2 + o(w 3/2 ), a 1 constant, results in the expansion Translating this back in terms of variables x, y, we obtain the desired asymptotic expansion Suppose now that (x k ) is a sequence of iterations with some x 1 > 0. Since x k+1 > x k + 1, we have x k+1 > x 0 + k, and so x k → ∞, as k → ∞. Thus, by Lemma 4, To derive the leading asymptotic term from (35) we only need The idea is to compare x k with a solution of the analogous differential equation Equation (37), in turn, yields An application of the mean value theorem leads to Recalling that lim k→∞ x k+1 /x k = 1 and the asymptotics (36), we obtain and, therefore, The recursion x k+1 = g(x k ) is homogeneous, meaning that any consequent term x k+1 of the sequence is a function of x k only, independent of k. Thus, we are interested in how the shift in the initial condition affects the sequence for large k.

Lemma 5
For any sequence (x k ) solving the recursion G(x k , x k+1 ) = 1, it holds that Proof If x 1 = β 1 the sequences are identical and the assertion trivial. We shall first examine the case x 1 > β 1 . Given the monotonicity of g(x), we have that x k > β k , for all k ∈ N. Thus, it suffices to prove that there exists a positive constant c such that x k − β k ≤ ck, for all k.
Since sequence (β k ) is unbounded and increasing, we can find a finite k 0 such that β k 0 > x 1 . Having identified the point k 0 of the inequality direction change, we know that, by monotonicity of g(x), the elements β k 0 +1 , β k 0 +2 , . . . dominate x 2 , x 3 , . . . respectively. Observe that Hence, comparing x k to the shifted sequence yields Whence the upper bound The asymptotic expansion (35) together with (38) implies β k = O(k); therefore, allowing us to choose c := k 0 M, where M is a constant such that β k ≤ Mk for all k ∈ N. Now we turn to the case when x 1 < β 1 . From the monotonicity of g(x), it is enough to show that there exists a positive constant c 1 satisfying x k − β k ≥ −c 1 k for all k.
Before stating an analogue of Lemma 1, we need to highlight the following monotonicity property of G. G(u, v) > 1 and u > x, then v > g(x). Analogously, if G(u, v) < 1 and u < x, then v < g(x). g(x). Then, G(u, g(u)) = 1 and G(u, v) > 1 imply v > g(u) by monotonicity of G that follows from

Proof From
Hence v > g(x). Analogously, u < x implies g(u) < g(x). By G(u, v) < 1 and the monotonicity of G, we have v < g(u) < g(x), which completes the proof. Now, we state and prove the analogue of Lemma 1 in the quickest selection problem.

Lemma 7
Let (x k ) be an increasing sequence such that G(x k , x k+1 ) > 1 (or, equivalently, x k+1 > g(x k )) for all sufficiently large k. Then for some constant c > 0 Similarly, if G(x k , x k+1 ) < 1 (or, equivalently, x k+1 < g(x k )) for all sufficiently large k, then for some c > 0 Proof Assume to the contrary that for arbitrarily large c 0 ∈ R + there exists k 0 ∈ N such that Choosing c 0 large ensures x k > g(x k ), k ≥ k 0 . Now, it is easy to see that β k 0 < x k 0 leads to a contradiction with the second inequality in (40); thus, we only consider the case β k 0 > x k 0 . Introducing a sequence (y k ) that satisfies G(y k , y k+1 ) = 1 and y k 0 = x k 0 , we have Moreover, by Lemma 5, there exists a positive constant c 1 such that Let c 2 := c 0 ∨ c 1 . Then we can find a k 1 ≥ k 0 , k 1 ∈ N such that and β k − y k < c 2 k, for k ∈ N.
For the second part of the lemma, assume G(x k , x k+1 ) < 1 for large k, but for an arbitrarily large constant c 0 , one can find k 2 such that When β k 2 > x k 2 , this leads to a contradiction with (43) immediately. Hence, we only consider the case β k 2 < x k 2 . For a sequence (z k ) satisfying G(z k , z k+1 ) = 1 and z k 2 = x k 2 , we have By virtue of Lemma 5, we can find a positive constant c 1 such that With c 2 := c 0 ∨ c 1 , we can find k 3 ≥ k 2 , k 3 ∈ N to have Since c 2 ≥ c 1 , one has β k − z k > − c 2 k, for all k. Hence, by (44), However, this contradicts (45), which finalises the proof of the lemma.
With this result in our toolbox, we are fully equipped to refine the asymptotic expansion of β k .

Asymptotic Expansion of β k
The order of the next term of expansion of β k is readily suggested by the upper bound in (3). However, to strengthen the hypothesis, we provide a heuristic argument based on the natural duality between this problem and the original problem of selecting the longest increasing subsequence.
We are taking a step further in exploring the connection between the two problems. Obtaining the asymptotic inverse of (23) suggests that Thus, heuristics hint at the second term of order O(k log k). In view of this, we choose the first approximating sequence (x (0) k ) where ω 0 is a parameter. Recalling the expansion (32), we obtain, as k → ∞, Therefore, Straightforwardly it follows that, for k large enough, Combining the inequalities (46) with Lemma 7 produces the following result To bound the remainder, we need yet another successive approximation. Choose a test function of the form where ω 1 is a constant. On the one hand, we have, as k → ∞, On the other hand, taking all four terms of expansion (32), Hence, Recall that the shift in the initial condition of the optimality recursion (27) results in the order O(k) change to the solution; since the comparison to the approximating sequence x (1) k provides a refinement of smaller order, we may bound the remainder in the expansion (47).

Theorem 3
The minimum expected time required to select an increasing sequence of length k satisfies the following asymptotic expansion

Corollary 3
The optimal threshold h k satisfies the following refined asymptotic expansion Proof Unfortunately, the direct computation of h k = 1 − β k /β k+1 from (48) does not yield any meaningful results due to the O(k) remainder. However, [3], Lemma 7 provides an asymptotic approximation to the optimal threshold functions in terms of the optimal value functions Using the one-term asymptotic expansion Plugging in the refined asymptotics (48) instead leads to the desired result.

A Quasi-Stationary Policy
In this section, we construct a simple quasi-stationary policy that, as k grows large, has the expected time of selection matching β k up to the leading term of expansion. We call it quasi-stationary because it has a second more conservative selection mode with a more narrow acceptance window. Our quasi-stationary policy has threshold functions independent of the remaining number of elements to be chosen, analogously to the stationary policy of Samuels and Steele [20]. We define our policy by choosing the threshold functions h i (z), i = 1, . . . k and the function a(k) : R + → [0, 1] is decreasing in k; we fix a(k) in the sequel. The policy acts in two regimes. Firstly, we accept every consecutive observation within η above the last selected item. Secondly, when the last selection size gets above 1 − a(k) − η, we abandon the initial rule and accept all admissible elements within an acceptance window of size a(k)/k.
The choice of η is inspired by the asymptotics of the optimal threshold (30). However, choosing 2/k exactly leads to a problem: with high probability the selection process will cross the 1 − a(k) − η barrier, while there are O(k 1/2 ) elements yet to choose. Loosely speaking, as k gets large, the selection process with a constant window is governed by the central limit theorem. Although the expectation of the sum of k random variables distributed uniformly on [0, 2/k] is 1, it has a standard deviation of O(k −1/2 ). A way to overcome this issue is to decrease the threshold size so that the probability of reaching the barrier is low, but keep it large enough so that the expected time of selection remains unchanged up to the terms of a lower order. The task narrows down to choosing a suitable a(k).
The value β k corresponds to the expected performance of the quasi-stationary policy in the rest of this section.

Theorem 4 The quasi-stationary policy with threshold functions h i (z) is asymptotically op-
Proof Let (Z j ) j ∈N denote the last selection process of the quasi-stationary policy. Introduce a hitting time ξ of the barrier 1 − a(k) − η where we follow the convention inf ∅ = ∞. Moreover, let the stopping time ρ be defined as ρ := ξ ∧ k.
In this notation, we can write β k out as follows Before the barrier is hit, the inter-selection times are independent and distributed identically as Geom(η), hence moreover, by Wald's identity Consequently, The second expectation in (49) is bounded by the expected time of selection in case the barrier is hit. Thus, A rough upper-bound on E(τ k |ρ < k) suffices for our purposes it follows from computing an expected time to select all k elements with a constant window a(k)/k. To get a grip on P(ρ < k) we first notice that Introduce a renewal sequence (S j ) with inter-arrival times distributed uniformly on [0, η]. For j < ρ, this sequence is equivalent in distribution to the gaps between consecutive selections Z j +1 − Z j |ρ < j. In light of this, we can write Since we have we can write the probability on the right-hand side of (52) in terms of μ as where = a(k) − 2/k. The probability in focus can be estimated from above by applying the Chernoff-Hoeffding inequality (see, for example, [7] for details) Thus, choosing a(k) := k −1/2+ε , 0 < ε < 1/2 we ensure that the probability in (53) has an exponentially decreasing upper-bound With a(k) finally fixed, from an upper bound (50) we have Taking together (51), (54) and (55) yields A sufficient lower-bound on β k follows from the inequality a(k)) .
Plugging in the expression for a(k) yields At last, combining (56) with (57) leads to and the result of Theorem 4 follows immediately.

A Self-Similar Policy
We shall construct next a self-similar policy to closer approach optimality. Recall that a selection policy is self-similar if it chooses the observation of size x if and only if Let β k be the value functions of such strategy; then, decomposing by the first arrival yields Computing the integral and rearranging This is an inhomogeneous linear recursion, which can be solved explicitly in terms of h k 's by the method of variation of constants. Introduce a self-similar suboptimal selection policy with thresholds Note that h k < 1 for k > 1, thus β k < ∞ for all k. The recursion defining the value functions β k becomes β k+1 = a k β k + b k , β 1 = 1, where a k = k 2 The homogeneous equation (59) has the general solution of the form y k+1 = a 1 · · · a k y 1 .
Taking two terms in the expansion of the logarithm we get which readily implies y k ∼ cy 1 k, for some c > 0. We see that y k is about linear in the initial value y 1 . Likewise, because the general solution is the sum of a particular solution and the general solution to the homogeneous equation, if we replace the initial value β 1 = 1 in the inhomogeneous equation by β 1 + θ , the corresponding solution will change by about θck.
On the other hand, equation (59) has necessary monotonicity properties to apply the asymptotic comparison method (since a k > 0). Checking that a test function satisfies the appropriate inequality for k > k 0 , we adjust the initial value (resulting in the O(k) deflection) for this k 0 to apply comparison in the already familiar way.
The comparison lemma adapted for recursion (59) states that if a sequence (y k ) is such that y k+1 > a k y k + b k , then β k − y k < ck, and vice versa.
Following the usual procedure, we choose test functions of the form The computation consists of three successive refinements, but we drop the explicit calculation. Matching coefficients and observing that the last term of y k is of order o(k), we obtain This result, together with the expansion (48), allows us to obtain the following theorem, which is the final accord of this paper.

Concluding Remarks
In this paper, we studied two classical sequential selection problems initiated by Samuels and Steele [20] and Arlotto et al. [3]. To refine the asymptotic expansions of the respective value functions, we developed a method of approximating solutions to the difference equations satisfying certain monotonicity criteria. This 'asymptotic comparison' method, as we called it, allowed us to methodically obtain finer asymptotics of the solution to the optimality equation by bounding it from the above and the below with suitable test functions. In fact, we believe this method to be applicable to a wider class of value function recursions. In particular, the method could be adapted to the improve the value function asymptotics in closely related online bin-packing problem [11], where only principal term is currently known. Theorem 1 holds for the special case of the bin-packing problem with uniform weights and a unit-sized bin. This suggests a logarithmic term in the expansion for the general case too, which ties in nicely with the logarithmic regret bound derived by Arlotto and Xie [1].
Although this paper achieves high precision in estimating the mean length v n in the longest increasing subsequence problem, there are several possibilities to improve the result. For example, one may prove the convergence of O(1)-term in the expansion (4) to a constant. With this settled, it may be possible to derive a refined expansion of the variance Var L n (τ * ), as was demonstrated in Gnedin and Seksenbayev (2019) [14] for the poissonised variant of the problem. Moreover, building a direct bridge connecting the results in the classical and the poissonised problems remains an open problem.