Optimality Regions and Fluctuations for Bernoulli Last Passage Models

We study the sequence alignment problem and its independent version, the discrete Hammersley process with an exploration penalty. We obtain rigorous upper bounds for the number of optimality regions in both models near the soft edge. At zero penalty the independent model becomes an exactly solvable model and we identify cases for which the law of the last passage time converges to a Tracy-Widom law.

1. Introduction 1.1. Directed growth models. In this article we study a generalisation of two specific models of directed last passage percolation, namely the longest common subsequence model concerning the size of the longest common subsequence between words drawn uniformly from a finite alphabet [8], and an independent version introduced in [40] as an exactly solvable discrete analogue of the Hammersley process [20]. We call the latter the independent model.
We study these models near directions for which the corresponding shape function starts developing a flat segment, which is called the soft edge of the model. Both models fit in the general framework [14], namely there is: (i) The random environment ω ∈ R Z 2 , whose law we denote by P. Each marginal ω u should be viewed as a random weight placed on site u ∈ Z 2 . (ii) A collection Π of admissible paths on Z 2 . A path π from u to v is uniquely identified by an ordered sequence of integer sites, so when necessary we write π = {u = u 0 , u 1 , . . . , u = v}. A path π is admissible if and only if its increments z k = u k − u k−1 are contained in a finite set R ⊂ Z 2 . For u, v ∈ Z 2 we denote the set of admissible paths from u to v by Π u,v . It is a requirement that P is stationary and ergodic under shifts T z , z ∈ R. (iii) A measurable potential function V : R Z 2 × R → R. For the two models under investigation we always have = 1 and V is a bounded function, thus satisfying the technical assumptions of [14]. The point-to-point last passage time from u to v is the random variable G V defined by A well studied version of the model is the corner growth model, for which R = {e 1 , e 2 }, the coordinates of ω are i.i.d. under P and the potential V for the corner growth model is defined by (1.2) V (ω, z) = ω 0 , z ∈ R = {e 1 , e 2 }.
Whenever we are referring to last passage time under this potential and these admissible steps, we will use T instead of G V . It is expected that under some regularity assumptions on the moments and continuity of ω 0 , the asymptotic behaviour of T (e.g. fluctuation exponents for T and the maximal path, distributional limits, etc) is environment-independent. This is suggested by results available for the two much-studied exactly solvable models when ω 0 is exponentially or geometrically distributed and further evidenced by the general theory in [14][15][16] and the edge results of [7,31], as we discuss later. The main models in this article have set of admissible steps R = {e 1 , e 2 , e 1 + e 2 } and the coordinates of the environment take values in {0, 1}. Our choice of potential is a two-parameter family of bounded functions, indexed by two non-negative parameters α and β: This particular choice of potential is inspired by a problem which appears in computational molecular biology, computer science and algebraic statistics, as we explain at the end of this introduction. Our strongest results are obtained when α = β = 0 and the marginals of ω are i.i.d. Bernoulli random variables on {0, 1} with parameter p ∈ (0, 1), because we then obtain a solvable model [39]. This will be referred to as the independent model, and the passage time from (0, 0) to (m, n) is denoted G (α,β) m,n when both α and β are important. When α = 0 we further simplify notation by G (β) m,n = G (0,β) m,n . The special case α = β = 0 was studied in [5,13,40]. Asymptotic results as p tends to zero were obtained in [25].
We consider a rectangle of height n and width m n = n /p − xn a for a ∈ (0, 1) and show that the fluctuations of G (0) mn,n converge, suitably rescaled, to the Tracy-Widom GUE distribution. The size of the rectangle is not arbitrary. A justification for this option comes by looking at the limiting shape function g pp (t) = lim n→∞ G (0) nt ,n n , continuous in t. When t > 1 /p the function has a flat edge: g pp (t) = 1.When p < t < 1/p, g pp (t) is strictly concave and when t < p, g pp has another flat edge, namely g pp (t) = t. Fluctuations of G (0) nt ,n are of order n 1/3 when t ∈ (p, 1 /p), so by looking at the rectangle m n × n we study these fluctuations at the onset of the flat edge, but when macroscopically we converge to the critical point t = 1/p.

Edge results.
There is a coupling of G (0) n p −xn a ,n with T n,n 2a−1 , which we describe in Section 4. This mapping was exploited in [13] to obtain the local weak law of large numbers for all a ∈ ( 1 /2, 1). We use the same coupling to obtain a distributional limit for the edge. The coupling classifies results for G (0) p −1 n−xn a ,n as "edge results". The terminology "edge results" is motivated by the fact that the last passage time T is studied in a thin rectangle, either with dimensions n × yn and letting y → 0 after sending n → ∞ [31], or with only one macroscopic edge, namely of dimensions n × xn γ with γ < 1.
Several results near the edge are universal, in the sense that they do not depend on the particular distribution of the environment. In the sequence we denote the environment for the corner growth model by ζ = {ζ u } u∈Z 2 + . An approximation of i.i.d. sums with a Brownian motion [26] was used in [17] to obtain the weak law of large numbers, and simulations lead to the conjecture that c = 2. The conjecture was proved in [41] via a coupling with an exclusion process and later in [4] using a random matrix approach. A coupling with the Brownian last passage percolation model [4,36] allow [7] to obtain where W is has the Tracy-Widom GUE distribution [43]: the limiting distribution of the largest eigenvalue of a GUE random matrix. If ζ 0 has exponential moments, (1.5) holds for all a ∈ (0, 3 /7).
1.3. The alignment model. The problem of sequence alignment [34,42] can be cast in this framework. Consider two words η x = η x 1 . . . η x m and η y = η y 1 . . . η y n formed from a finite alphabet A. We consider the case where each letter of η x and η y is chosen independently and uniformly at random from A. We are looking for a sequence of elementary operations of minimal cost that transform η x to η y . These operations are: (1) replace one letter of η x by another, at a cost α (2) delete a letter of η x or insert another letter, each at a cost of β.
Assign a score of 1 for each match and subtract the costs for replacements, deletions and insertions. Each sequence of operations taking η x to η y is thus assigned a score L (α,β) m,n , also often called the objective function. We will also write L (β) m,n for L (0,β) m,n . A problem arising in molecular biology [1,21,35,37,44,46] is to maximise this alignment score. In that context the words η x and η y can be DNA strands (with A = {A, C, G, T }), RNA strands (A = {A, C, G, U }) or proteins (with A the set of amino acids that make up a protein), and the elementary operations correspond to mutations. A choice of the parameters α and β corresponds to a judgement on how frequently each type of mutation occurs. The optimal score for an alignment of η x with η y can then be considered a measure of similarity between these words. The question also appears in algebraic statistics [38]: there the objective function is the tropicalisation of a co-ordinate polynomial of a particular hidden Markov model.
On the other hand, the alignment score L i.e. the marginals of ω are (correlated) Bernoulli random variables with parameter |A| −1 . The model with this choice of environment is referred to as the alignment model.
A deletion of a character in η x corresponds to a horizontal step (e 1 ) in the last passage model, whereas an insertion of a letter into η x corresponds to a vertical step (e 2 ). Replacing a letter in η x by another corresponds to a diagonal step (e 1 + e 2 ) onto a point (i, j) where ω ij = 0, whereas any letter left alone (i.e. a successful alignment) corresponds to a diagonal step onto a point (i, j) where ω ij = 1. The path Figure 1. Environment generated by the two strings AABABA and ABAABA. Colored dots correspond to the value 1, white dots to the value 0. The thickset path is a maximal path in this environment, from (0, 0) to (6,6) with minimal number of vertical or horizontal steps (just 2 in this case). When α = 0, the illustrated path has score 5 − 2β since the environment only contributes to the weights if collected by a diagonal step. The score coincides with the last passage time for β ≤ 1/2. For α = 0 and β > 1/2 the main diagonal is optimal, with score equal to 4. These are the only two optimal paths, so there are two optimality regions.
in Figure 1.3 corresponds to the alignment in which the bar under the first A of η x corresponds to deleting the letter A from η x while the bar in η x corresponds to inserting the letter A there. A convenient way to look at this is that the bars, called gaps, are used to stretch the two words appropriately so that different matchings are obtained.
1.4. Optimality regions. Which paths are optimal depends on the choice of parameters α, β. In molecular biology these parameters are often chosen ad hoc and it is not clear that there is a single 'right' choice [44]. An alternative approach is to consider the space C = [0, ∞) × [0, ∞) of all possible parameters (α, β) and to analyse how the optimal paths change as (α, β) varies. A maximal subset of C on which the set of optimal paths does not change is called an optimality region of C. The shape of optimality regions in C are semi-infinite cones bounded by the coordinate axes and by lines of the form β = c + α(c + 1/2) for certain values of c. So it suffices to study the number of regions with one parameter fixed; we will set α = 0.
Denote the number of optimality regions in this model by R (al) m,n . Naturally the (expected) number of optimality regions attracted a lot of interest both theoretically [12,19,45] and in applications [10,24,30,33]. The current conjecture [11,38] is that E(R (al) n,n ) = O( √ n), but the complexity of the random variable does not allow for direct calculations. In this article we obtain an asymptotic lower bound for the optimal score when a is fixed, as well as upper bounds for the number of optimality regions when the rectangle is of dimensions m n × n. With random words of this size the biological applications are unrealistic but the results offer some insight from a theoretical perspective. Moreover, we prove that O( √ n) for the expectation is not the correct order in this case, at least for a < 3/4.
Optimality regions can be studied in the independent model as well, and in fact we can obtain stronger results, again when the rectangle is of dimensions m n × n.

1.5.
Outline. The paper is organised as follows: in Section 2 we state our main results. Section 3 contains preliminary results that do not depend on the specific choice of environment and therefore hold for both the alignment and the independent model. The results concerning the independent model are proved in Section 4 whereas in Section 5 we prove our results about the alignment model. 1.6. Notation. We briefly collect the pieces of notation discussed so far and list the most common notation used in the paper. Letters T , G and L all denote last passage times: T is for passage times under potential (1.2), G is the passage time for the independent model and L its counterpart for the alignment model. The letter R is reserved for the number of optimality regions, and we distinguish the regions in each of the two models by R Throughout, p is a parameter in the interval (0, 1) and q = 1 − p. A is the alphabet in the alignment model and |A| is its size.

Results
In this section we have our main results, first for the independent model and then the softer ones for the alignment model.

Independent model. See Section 4 for a proof of Theorems 2.1, 2.3, 2.4 and Corollary 2.2.
We consider the last passage time G (0) mn,n with m n = n/p − xn a for suitably chosen x. When the exponent a is small we obtain tightness without rescaling, for any choice of x: where Φ is the cumulative distribution function of the standard Gaussian distribution.
We will see in (3.12) that R (ind) m,n < n − G (0) m,n . As a corollary we obtain an asymptotic bound on the expected number of optimality regions: For a > 1/2 we state a bound on the number R  ( In the theorem above, equation (2.4) holds also when a > 3/4, however the bound n 2a/3 is sharper. Finally when a ∈ ( 1 /2, 5 /7] we obtain Tracy-Widom fluctuations. It is worth noting that we do not take the standard approach of scaling by the variance. Instead, we change the size of the rectangle, by subtracting a term of size n 2−a 3 from the width. Then (2) For 2/3 ≤ a ≤ 5/7, Remark 2.5. The case a ≥ 5 /7 corresponds to an exponent γ = 2a − 1 ≥ 3 /7 in equation (1.5) (see [7]) and the result cannot be extended further with these techniques. In Section 3.1 of [7] the authors explain why their result should extend at least up to exponent γ = 3 /4. The independent Bernoulli model here, while equivalent to the edge of the corner growth model may be a bit more sensitive to these cut-offs and indeed γ = 3 /7 seems to be critical and manifests itself in the proof.
From the two cases of Theorem 2.4 we see that we need to amend the right-hand side of the event in (2.6) by a term O(n 3a−2 ), in order to get the non-trivial result in (2.7). This gives a new cut-off a = 2 /3 (or γ = 1 /3). The term is there for case 2 as well, but when a ≤ 2 /3 the term is bounded and plays no role, while it must be dealt with, for higher a.
Second, from the proof of Theorem 2.4, the exponent a = 5 /7 (γ = 3 /7) seems to be critical, since it is necessary to have 2a − 1 < 2−a 3 to balance the various orders of magnitude that appear. Assuming that the scaling in (1.5) remains the same for γ ∈ ( 3 /7, 3 /4), this change implies a corresponding correction term of size O(n γ ) at the numerator of (1.5).
2.2. Alignment model. Throughout we fix a finite alphabet A with |A| ≥ 2, from which the letters of words η x and η y are chosen uniformly at random, independently of each other and let a ∈ (0, 1) and α, β ≥ 0. The proofs of Theorems 2.6, 2.7 and 2.8 can be found in Section 5. Define . and the lower bound Finally, we turn to the number of optimality regions for the alignment model. The first result gives an upper bound on the asymptotic growth of the number regions: Theorem 2.7. Let x > 0. There exists a constant C 1 (|A|, x) that only depend on x and |A| so that The constant tends to 0 as the alphabet size tends to ∞.
We also have a bound of the same order for the expected number of optimality regions.
Theorem 2.8. There exists a constant C 2 (x, |A|), iso that The constant tends to 0 as the alphabet size tends to ∞.
Remark 2.9. These results are also valid for the independent model. Given the stronger bounds for the independent model, we do not expect (2.9) to be sharp, particularly for small values of the exponent a, and this is supported by Monte Carlo simulations. For example these suggest that for a ≤ 1/2 the  number of expected regions is bounded (see Figure 4). This is also the case for the independent model as we see in Theorem 2.2. For a > 1/2, the simulations in Figure 5 show that the expected number of regions is growing for small alphabet sizes, but again the exponent of growth is smaller than 2a/3 and it seems to depend on the alphabet size.

Model independent results for optimality regions and maximal paths
In this section we present preliminary results about the two models that do not depend on the correlation structure of the weights. We therefore write R m,n to mean either R (al) m,n or R (ind) m,n . We also introduce the vocabulary usually used in the sequence alignment literature.
The last passage time G For future reference we record the following observations: (1) For fixed α ≥ 0 and any β 1 ≤ β 2 we have and therefore this inequality also holds for the passage times: (2) For α = −1 and β = −1/2, the weight of any path π ∈ Π 0,(m,n) is given by This result was first proved in [19]; we give a simplified proof here: Proof. Pick any (α, β) ∈ R 2 + and let (0, β ) be the point of intersection of the linear segment connecting (α, β) and (−1, −1/2) with the y-axis, i.e.
We will show that the optimal paths associated with (0, β ) are the same as those associated to (α, β). Consider any π ∈ Π 0,(m,n) with s(π) = (x, y, z). Then So the weight of any path with parameters (α, β) is an affine function of the weight with parameters (0, β ) and the two parameters must belong to the same optimality region.
Under a fixed environment ω, we define the critical penalties to be the the gap penalties for α = 0 at which the optimality region changes. We will also write β ∞ for the last threshold β Rm,n .
Proof. Continuity of the optimal score in the parameter β implies that at β k+1 the weights will be the same whether β k+1 is approached by above (considering scores of paths in Γ (β k+1 ) 0,(m,n) ) or from below (scores of paths in Γ (β k ) 0,(m,n) ). Therefore which yields the conclusion.
Upper bounds for the maximal value of R m,n can be found in [11]. For the LCS model these are sharp when the alphabet size grows to infinity. The results and arguments in [11] can be extended to give the upper bound (3.10) R ns +o(n), nt +o(n) ≤ Cn 2/3 , that holds in any fixed realization of the environment, any (s, t) ∈ R 2 + and n large enough. They also proved that environments that actually generate so many regions exist, at least when the alphabet size was infinite. This was later verified also for finite alphabets in [45].  Proof. Distinct paths π i differ in the number of diagonal steps and the number of gaps. Since a diagonal step is equivalent to two gaps, we have y i − y i+1 ≥ 2. Furthermore it must be the case that z i − z i+1 ≥ 1; otherwise π i would violate the MGM condition. Equation (3.2) and the last two inequalities give x i+1 − x i ≥ 2. Adding each inequality over i gives the first three terms in the minimum of (3.11). For the last term note that y Rm,n = n ∨ m − m ∧ n. Since x 0 ≥ 0 (3.2) yields 2(n ∧ m − z 0 ) ≥ y 0 − y Rm,n .
Remark 3.6. Notice that the last bound in (3.11) can be written as Finally, we present a lemma that gives a useful bound on the number of regions if a bit more information is available.
Lemma 3.7. Let m = m(n) so that m(n) → ∞ as n → ∞. Let g(n) be a deterministic function so that lim n→∞ g(n) = ∞. Then, there exists an N > 0 and a non-random constant C 0 so that for all n > N we have the inclusion of events In particular, Proof. Statements (3.14), (3.15) are immediate corollaries of (3.13) which we now show. Fix an environment ω ∈ A n . Then we have that The sum above has as terms the numerators and denominators of the critical penalties (see Lemma 3.4). Each critical penalty is a distinct rational number and it corresponds to a change of optimality region. The bound g(n) is independent of the environment, so we can obtain an upper bound on the number of regions that is independent of the environment, if we maximize the number of terms that appear in the sum.
Since the terms in the sum are integers, the maximal number of terms is the maximal number of integers k that can be added so that the bound g(n) is not exceeded. Those integers k need not be distinct but they need to able to be written as a sum of integers a, b, k = a + b so that a/b are different. This is because the ratio a/b corresponds to critical penalties and those are distinct. Take each successive integer k and compute the number of irreducible fractions a/b so that a + b = k.
The number of irreducible fractions satisfying this is ϕ(k), where ϕ is Euler's totient function [3]. The number of distinct values k that can be used is M max , which must satisfy Mmax k=1 kϕ(k) ≤ g(n) < Mmax+1 k=1 kϕ(k).
These inequalities imply that M max will be bounded above, up to a lower order term, by cg(n) 1/3 . This follows by the asymptotics of ϕ for large arguments, and we direct the reader to the proof of Theorem 5 in [11] for the details. The bound on M max is true for all n > N 1 large enough. Then an upper bound for the number of admissible pairs (a, b) (and therefore for the maximal number of regions) is This last estimate is again the result of an analytic number theory formula (see [3]) which also works for n > N 2 large enough. So both deterministic bounds hold for all n > N = N 1 ∨ N 2 .
The difficulty with the alignment model is the correlated environment. Therefore, the soft techniques below try to avoid precisely this issue. The same techniques work for the BLIP model and give identical bounds, but the exact solvability of that model often allows sharper results.
Our strategy is to construct a path with a score that is near-optimal under any penalty β and which attempts to minimize as much as possible the number of vertical steps. This will be important for the lower bound for the passage time under penalty β R , where we know that the optimal path takes no vertical steps. We present the construction and results for alignment model, but re-emphasize that they hold for both.
3.1. Construction of the path. Fix an environment ω on N 2 , defined by two infinite words η x , η y , where each letter is chosen uniformly at random. ω i,j is defined according to (1.6).
Consider the following strategy (S) to create a path π S : (1) For some appropriate constants c 1 and c 2 (to be determined later), move with e 1 + e 2 steps from 0 up to a fixed point  (2) Now, from u n (a) construct the path as follows (a) If the path is on site (i, j) with j < n and ω i+1,j+1 = 1 then move diagonally with an e 1 + e 2 step, and now the path is on site (i + 1, j + 1). (b) If the path is on site (i, j) with i < |A|n − xn a and ω i+1,j+1 = 0 then move horizontally with an e 1 step, and now the path is on site (i + 1, j). (c) If j = n or i = |A|n − xn a , move to ( |A|n − xn a , n). From this description it is not clear whether we can enforce the condition that no vertical steps will be taken by π S . However, this will happen for eventually all n, by choosing constants c 1 , c 2 appropriately. Consider an infinite pathπ S that moves according to strategy (S) but without the restrictions i < |A|n − xn a for (3)-(b) and without step (3)-(c).
Let Y j be the random variables that give the amount of horizontal steps pathπ S takes at level y = j + u n (a) · e 2 , (3.17) Y j = |{i ∈ N : (i, j + u n (a) · e 2 ) ∈π S }|.
Becauseπ S does not have a target endpoint, we have By construction, the Y j are i.i.d. with mean |A|. Pathπ S coincides with π S up until the point thatπ S hits either the north or east boundary of the rectangle [0, |A|n − xn a ] × [0, n]. Whenπ S touches the north boundary first, we can conclude that π S has no vertical steps up to that point. We will estimate precisely this probability, using the following moderate deviations lemma [9].
Lemma 3.8. Let (X N ) N ∈N an i.i.d. sequence of random variables with exponential moments. If N λ 2 N → ∞ and N λ 3 From the equality of events (3.20) {π S exits from the north boundary} = we estimate for a ≤ 1/2 and for n sufficiently large for the asymptotics in (3.19) to be accurate, The constant c 0 only depends on |A| which is assumed to be strictly larger than 1. Choose c 1 > 2 (|A|−1) 2 so that the probabilities of the event {π S exits from the east boundary} are summable in n. Then by the Borel-Cantelli lemma, we can find an M = M (ω) so that for all n > M pathπ S hits the north boundary first.
The situation for a > 1/2 is similar. Starting from (3.21), we have Then the proof goes as for the previous case, and again it suffices that c 2 > 2 (|A|−1) 2 . From the definition of π S and the above discussion, we have shown the following: ω k,k positive weight.
Since π S has the smallest number of gaps possible, it can be optimal under any penalty β.

The independent model
In this section we prove results about the independent model. We begin with a coupling between the longest common subsequence in the independent model, with the corner growth model in an i.i.d. Geom(1− p) environment. This is achieved via the following identity. Recall that T m,n denotes the last passage time in an m × n rectangle, with admissible e 1 or e 2 steps only, under potential (1.2).
The result follows from the arguments in [13], and we briefly present the main idea.
The discrete totally asymmetric simple exclusion process (DTASEP) with backward updating is an interacting particle system of left-finite particle configuration on the integer lattice, i.e. such that sites to the left of some threshold are empty (see Figure 6). Label the particles from left to right and denote the position of the j th particle at time ∈ N by η j ( ). At every discrete time step ∈ N each particle independently attempts to jump one step to the left with probability q = 1 − p. Particle i performs the jump if either (1) the target site was unoccupied by particle i − 1 at time − 1 or, (2) the target site was occupied by particle i − 1, but it also performs a jump at time . In words, particles are forbidden to jump to occupied sites and we update from left to right. Start DTASEP with the step initial condition η i (0) = i so that initially the i-th particle is at position i. Let τ i,j be the time it takes particle j to jump i times: Then the following recursive equation holds where theζ i,j are independent Geometric variables with parameter q = 1 − p, supported on N 0 .
By setting ζ i,j =ζ i,j + 1 ∼ Geom(1 − p) ∈ {1, 2, . . .}, the τ i,j can be coupled with the last passage time in the corner growth model (cf. [13], Lemma 5.1), giving the equality in distribution We embed DTASEP in the two-dimensional lattice Z × N + , using its graphical construction as follows: Let {b k, : (k, ) ∈ Z × N + } be a field of i.i.d. Bernoulli(q) random variables and assign to each site (k, ) the random weight b k, . Particles are placed initially on N + × {0}, with particle i at coordinate (η i (0), 0). The Bernoulli marked sites signify which particles will attempt to jump in the DTASEP process.
After the spatial locations in the DTASEP at time = 1 are determined, the particles in the graphical construction are at positions (η i (1), 1). We iterate this procedure for all times ∈ N.
Then, the environments between graphical DTASEP and BLIP may be coupled via In [13] the following combinatorial identity was proved:    (1) and (2). Symbols ⊗ denote Bernoulli(p) weights 1, and particle underneath an ⊗ symbol cannot jump during that time, i.e. particles jump with probability 1 − p = q as long as the exclusion rule is not violated. The trajectory of particle 4 is highlighted for reference. Since N (n) is eventually monotone, we can invert the expression above and find n in terms of N for sufficiently large n (and hence N ). In particular, To see this we compute . Combining (4.1) and (4.5) The results follow by first dividing by √ p q √ N and the central limit theorem, when we let n (hence N ) tend to infinity. When a < 1/2 the right hand side after scaling tends to 0 and the probability converges to 1/2. When a = 1/2 the right-hand side in the probability converges to xpq − 1 /2 and the probability to Φ(xpq − 1 /2 ).

4.2.
Proof of Theorem 2.2. We first show the result when a < 1/2. Using equations (3.12) from Remark 3.6 and (4.7) from the proof of Theorem 2.1, we have for C 1 large enough. As in the proof of Theorem 2.1 we have N = nq p − xn a + k and let Φ denote the cumulative distribution function of the standard normal distribution. Fix a tolerance δ > 0 satisfying Φ(δ)+δ < 1 and let n 1 (δ) large enough so that C 1 N a−1/2 < δ for all n > n 1 (δ). Applying the Berry-Esseen theorem to the last line of the last display, For n ≥ n 0 (δ) = n 1 (δ) ∨ n 2 (δ) the right hand side of (4.9) is uniformly summable in k. Moreover, by (4.9) and the reverse Fatou's Lemma we compute where the penultimate inequality follows from (3.12) and the last from Theorem 2.1.
The case a = 1 /2 is slightly more delicate, but the ideas are exactly the same. As before, The right-hand side converges to (Φ(xpq − 1 /2 )) k and with the same arguments as before,
The last inequality follows from (3.12) and the last equality is from (1.4). This gives the second part of the statement. When a ∈ ( 3 /4, 1) we can obtain a sharper bound using Lemma 3.7.
From the proof of Lemma 3.5 and Lemma 3.9 we can find a constant C 1 such that n − G (β R ) p −1 n−xn a ,n = n − z R < C 1 n a in probability, as n grows. Therefore, with probability tending to 1 as n grows, (4.11) z 0 − z R < C 1 n a .
Moreover, since the the number of vertical steps at β = 0 cannot exceed n − G (0) p −1 n−xn a ,n , (1.4) gives that with probability tending to 1 (4.12) y 0 − y R ≤ n − G (0) p −1 n−xn a ,n < C 2 n 2a−1 . Equations (4.11), (4.12) now yield a constant C such that (4.13) lim Let A n the event in the probability above. , s ∈ R.
We further define an auxiliary parameter N that will go to ∞ when n goes to infinity. (4.14) where c n is given by Note that with m n = 1 p n − xn a − yn Our goal now is to change n to N and compute m n , n, c n in terms of N , similarly to the proof of Theorem 2.1. (1) Step 1: m n − n and c n as a function of N : Start from (4.14) and raise it to the power 2a − 1.
Then, apply Taylor's theorem to obtain Note that the equation above holds, irrespective of the value of a, as long as a < 5/7; for a ∈ [0, 5 /7) the exponent 5a−4 3 < 0, so c n = N 2a−1 + o(1) follows. Therefore, a substitution in (4.16) yields (2) Step 2: n as a function of N : We begin by writing n as a function of N . Observe that N (n) in equation (4.14) is an eventually monotone function. Therefore, for N large enough, there is a well defined inverse n = n(N ) (so that N (n(N )) = N ). We cannot directly use a closed formula for the inverse, so we define the approximate inverse (N ) by To see that (N ) plays the role of the inverse n(N ), substitute (N ) in (4.14) and estimate using a Taylor expansion the distance This contradicts (4.18) since β > 2a − 1. In particular we have shown that = 0, and we may write To finish the proof we need to be a bit cautious with the integers parts. Define k N to be It follows from (4.17) that k N is bounded in N (and n). Also set N = N + ε N . Substituting these in equation (4.1) we compute The passage time in the probability above can be compared with T N 2a−1 , N and satisfies Since a < 5 /7, the number of geometric random variables in the right-hand side of the inequality is of lower order than N 2−a 3 and when scaled by it, the double sum vanishes P-a.s. This allows us to remove k N from (4.22) and equation (1.5) now gives the result by taking n → ∞.

Optimality regions in the alignment model
In this section we prove our results about the alignment model. Because of Lemma 3.3 and (3.7) it is enough to consider the case where α = 0. Now it is straight-forward to prove theorems 2.6 and 2.7.
5.1. Proof of Theorem 2.6. Restrict to the full measure set of environments so that Lemma 3.9 is in effect. Fix one such environment and assume n is large enough so that statements (1)-(3) of Lemma 3.9 hold. Let g (a) (n) = √ n log n, a ≤ 1/2, n a , a > 1/2.
Path π S is admissible under any penalty β, therefore by re-arranging the terms in the inequality of Lemma 3.9,  This completes the proof.
We briefly explain the upper bound for y 0 . First, any maximal path will always take the minimum number of gaps, which is |A|n − xn a − n. After that, it has to take the correct number of diagonal steps to gain weight equal to L n|A|−xn a ,n . Now all the remaining steps can either be gaps or mismatches, so we obtain an upper bound if we assume the number of mismatches is zero. The bound then follows from (3.2).
On the complement of D n we bound R by n, by virtue of (3.12). Then for n large enough, E(R This gives the result.