Tracy-Widom distribution , Airy 2 process and its sample path properties

Tracy-Widom distribution was first discovered in the study of largest eigenvalues of high dimensional Gaussian unitary ensembles (GUE), and since then it has appeared in a number of apparently distinct research fields. It is believed that Tracy-Widom distribution have a universal feature like classic normal distribution. Airy2 process is defined through finite dimensional distributions with Tracy-Widom distribution as its marginal distributions. In this introductory survey, we will briefly review some basic notions, intuitive background and fundamental properties concerning Tracy-Widom distribution and Airy2 process. For sake of reading, the paper starts with some simple and well-known facts about normal distributions, Gaussian processes and their sample path properties. §1 Normal Distributions and Gaussian Processes This section contains some simple and well-known facts about normal distributions and Gaussian processes. The reader can find them in most of advanced probability and mathematical statistics textbooks for graduate students, see [4, 5] by Billingsley, [12] by Durrett, [23] by Lin, Lu and Su. The purpose of this section is to provide some helpful clues about the next research work around Tracy-Widom distribution and Airy2 process from well-studied normal distributions and Gaussian processes. Tracy-Widom distributions are quite novel compared with normal distributions, and have been very hot words in probability and statistics in the past twenty-five years. Throughout the paper, c1, c2, · · · , stand for positive constants possibly varying from place to place. 1.1 Normal Distributions The normal distribution is arguably the most important distribution in probability theory and mathematical statistics. The normal density function is defined as follows. p(x;μ, σ) = 1 √ 2πσ e− (x−μ)2 2σ2 , x ∈ R Received: 2020-9-1. Revised: 2020-11-17. MR Subject Classification: 60K35, 60B20.

where µ ∈ R and σ > 0 are two parameters. In particular, p(x) = 1 √ 2π e −x 2 /2 is the so-called standard normal density function. It consist of a symmetric bell-shaped curve asymptotically close to the x-axis as |x| → ∞. It is an elegant and nontrivial exercise to verify The standard normal distribution function Φ(x) is defined as Φ(x) is a strictly increasing positive function with bounded derivative. However, there is no closed form for Φ(x), people usually use the approximate values in practice. It is easy to see Φ(0) = 1 2 , Φ(x) = 1 − Φ(−x), and A widely used simple inequality is for each Let X be a random variable with probability density function p(x). Then it follows EX 2k−1 = 0, EX 2k = (2k − 1)!!, k ≥ 1 Ee tX = e t 2 2 , t ∈ R. Let Y = µ + σX, then Y has probability density function p(x; µ, σ 2 ).

Central Limit Theorems
The first landmark achievement in probability theory is Bernoulli's law of large numbers for binomial random variables. It gives a completely new and rigorous mathematical interpretation for the belief that frequency has always a limit as the number of times of trials is sufficiently large. In particular, suppose that ξ 1 , ξ 2 , · · · , is a sequence of i.i.d.random variables, P (ξ 1 = 1) = p, P (ξ 1 = 0) = 1 − p, where 0 < p < 1. Let S n = ∑ n i=1 ξ i , then as n → ∞, This is the most important result in Bernoulli's celebrated book Ars conjectandi published in 1713. In modern terminology, it reads as follows S n n P −→ p, n → ∞.
Following Bernoulli's work, the second great achievement is the central limit theorem. De Moirve (p = 1 2 , 1733) and Laplace (p ̸ = 1 2 , 1812) established In other words, for any real numbers a < b ) .
This can be seen from the asymptotic formula n! = √ 2πn n n e −n (1 + o(1)) discovered by De Movire and Stirling.
de Movire and Laplace discovered the above limit distribution for a very simple and special model. A natural question is whether such a result holds true for other cases. This is absolutely not an easy problem. As a matter of fact, it was not solved until 1920s with the introduction of characteristic functions. The Lévy-Feller central limit theorem states that if X 1 , X 2 , · · · is a sequence of i.i.d.r.v.'s and S n = if and only if EX 1 = µ and V ar(X 1 ) = σ 2 < ∞. There have since then been several different proofs in the literature, among which are characteristic function method (Lévy continuity theorem), Linderberge replacement trick, and Stein's equation (Stein's continuity theorem). The importance of this theorem lies in that there is no assumption about the underlying distribution of random variables X i 's. The limit distribution is normal for normalized sums of i.i.d.r.v.'s whenever the second-order moment exists and is finite. Not only this, the above limit theorem is still valid for a more general class of random variables. In fact, it has been proven for sums of non-i.i.d.r.v.'s, martingale differences, m-dependent r.v.'s, mixing sequences, and even for random matrix eigenvalues, determinantal point processes. In other words, the central limit theorem is really a universal principle in probability theory. As you notice, the linear structure of partial sums plays an essential role for the validity of the central limit theorem. Due to its universality and simplicity, the normal distribution and central limit theorems have been widely used in a lot of research areas, like statistical inferences (hypothesis testing and confidence interval estimation), financial market (stock pricing), insurance risk (premium pricing), particle physics (motion path).

Functional Central Limit Theorems
In 1950-60s, a culminated work in probability theory is the following Donsker invariance principle. Let ξ 1 , ξ 2 , · · · , be a sequence of random variables with mean zero and variance 1, let S n = ∑ n i=1 ξ i . Define for each t ∈ [0, 1] Then X n ⇒ B. where X n = (X n (t), 0 ≤ t ≤ 1), B = (B(t), 0 ≤ t ≤ 1) is Brownian motion (see Subsection 1.5 below). Namely, for any bounded continuous function f : This theorem does not only prove the existence of Brownian motion, but also provides a unified frame of solving diverse problems. Let h : C[0, 1] → R d (d ≥ 1) be a continuous (or even slightly discontinuous) mapping, then (2) This is referred to as functional central limit theorem. Note the limit h(B) does not depend on the distribution of ξ i , either. As an illustration, we consider the following case. Letting h(x) = sup 0≤t≤1 x(t), then by (2) In turn, to figure out explicitly the limit P ( , we let ξ i be a special random variable P (ξ i = ±1) = 1 2 , and apply the formula (2) once again, Now we can apply the reflection principle of simple symmetric random walks to yield Thus we get the limit distribution of P for general random variables. This methodology is expounded at length in the classic book by Billingsley [4], and its use is endless. Here we emphasize that such ideas will be repeatedly used in the context of GUE Tracy-Widom distribution and Airy 2 process as well.

Multivariate Normal Distributions
Let m ≥ 1. An m-dimensional random vector X = (X 1 , X 2 , · · · , X m ) ′ is normal if the joint probability density function is given by It is easy to see EX = µ, V ar(X i ) = σ 2 i , and Cov(X i , X j ) = ρ ij σ i σ j for i ̸ = j. X i and X j are uncorrelated if and only if they are independent of each other. For any 1 ≤ l < m, (X i1 , X i2 , · · · , X i l ) is an l-dimensional normal random vector. It is easy to show that X is normal if and only if its any linear combination is normal. More generally, let A = (a ij ) l×m be a real matrix, then Y = AX is an l-dimensional normal (possibly degenerate) random vector with EY = Aµ and Cov(Y) = AΣA ′ . In particular, letting A = Σ −1/2 , then Y is a normal vector consisting of i.i.d. standard normal components. Note a normal random vector is uniquely determined by its mean vector and covariance structure.

Gaussian Processes with Sample Paths
A random process G = (G(t), t ≥ 0) is Gaussian if for any k ≥ 1 and any t 1 , t 2 , · · · , t k ≥ 0, ) is a k-dimensional normal random vector. Gaussian processes have been well-studied and have a wide range of applications. We only take Brownian motions and Ornstein-Ulenbeck processes as two representative examples below.
Brownian motion is named after English botanist Brown, who first observed the zigzag motion of pollen power in the liquid. Following Bachalier, Einstein, Frechét, Hilbert and Lebesgue, Wiener (1920s) proved the existence of Brownian motion by a rigorous construction and found many fundamental and nice properties like non-differentiablity, 1 2 − δ Hölder continuity, unbounded variation and quadratic variational processes.
A standard Brownian motion B = (B(t), t ≥ 0) is a Gaussian process with EB(t) = 0 and EB(s)B(t) = min(s, t). It is easy to see and According to Kolmogorov's continuity criterion, there exists a continuous version for B = (B(t), t ≥ 0). But the sample paths are extremely irregular, in particular, nowhere differentiable, and even have unbounded variation. Note a Gaussian vector is independent if and only if they are pairwise uncorrelated. Hence B = (B(t), t ≥ 0) has stationary independent increments, and so is a Lévy process. Further, it is a continuous time strong Markov process starting at 0 with transition density function Still, B = (B(t), t ≥ 0) is also a continuous martingale with linear variation process. Assume that B = (B(t), t ≥ 0) is a Brownian motion, then so are the following three random processes.
(1) For any stopping time T , Since for any s < t and x > 0 P ( where Υ is a random variable with P (Υ > x) ≤ c 2 e −c1x 2 . More refinement has been done. In particular, there has been a lot of refined results about how big or how small the increments of Brownian motion (even Gaussian processes) is in the past decades. The reader is referred to a classic book by Csörgő and Révész [9] for more information.
As Brown first observed, Brownian motion walks along a zigzag path. How far does it reach within an interval of time [0, 1]? It turns out that By the scaling property, namely , one can compute the probability P (sup 0≤t≤T B(t) > x). Trivially, sup 0≤t≤T B(t) increases to infinity as T → ∞.
The maximum of B(t) − t 2 over [0, ∞) plays an important role in certain recent studies on the asymptotic distributions of tests for monotone hazards, based on integrals type statistics measuring the distance between the empirical cumulative hazard functions and its greatest convex minorant. Groeneboom and Temme [16] in 2010 analyzed the tail behavior of the maximum and gave an asymptotic expansion for P (sup t≥0 (B(t) − t 2 ) > x): where τ k is a sequence of numeric constants. The maximum of a two-sided Brownian motion on (−∞, ∞) originated at zero can be obtained by the independence and a simple probabilistic analysis.
Let us turn to the second example, Ornstein-Ulenbeck process. Let B = (B(t), t ≥ 0) be a standard Brownian motion, λ > 0. Define ). Hence X = (X(t), t ∈ R) is a stationary Gaussian process, whose correlation decays exponentially as the gap of time increases, and so is ergodic in the sense of mean.
This process can also be induced by a stochastic differential equation dX(t) = −X(t)dt + dB(t), X(0) ∼ N (0, 1). The probability density function p(x, t) satisfies the Fokker-Planck equation ∂p ∂t = 1 2 with initial value p(0, x) = 1 √ 2π e −x 2 /2 . We also remark that the Ornstein-Uhlenbeck process can be interpreted as a scaling limit of a discrete process, in the same way that Brownian motion is a scaling limit of random walks.
Before concluding the Introduction, we would like to take a quick look at the regularity properties of sample paths of general Gaussian processes. Suppose X = (X(t), t ∈ T ) is a centered Gaussian process, where T is an index set. Define the pseud-metric d X (s, t) = E(X(t) − X(s)) 2 , s, t ∈ T. Let N (T, d X , δ) denote the number of finite δ-nets covering the set T , i.e., metric entropy. Then for any δ > 0 The LHS of (4) follows from the Slepian comparison lemma, while the RHS of (4) follows from the chaining argument due to Dudley in the 1960-70s. However, there is a gap between the upper bound and lower bound. This was resolved by Fernique and Talagrand by introducing the concepts of majorizing measures and using the refined generic chaining argument. The final result is as follows. Let where the infimum is with respect to all admissible sequences. Then As for the tail probability estimates, we have the following rough result: The reader is referred to recent nice graduate textbooks by Vershynin [36] and Wainwright [38] for more information. §2 Tracy-Widom Distribution The Tracy-Widom distribution was first discovered in the study of spectral properties of high dimensional random matrices by two mathematicians C.Tracy and H.Widom from California in 1990s. Their paper Level-spacing distributions and the Airy kernel, published in 1994 in Communications in Mathematical Physics, was awarded the 2020 Steele prize for seminal contribution to research in analysis/probability theory.
From a statistician's perspective, the introduction of the Tracy-Widom distributions has been a breakthrough of lasting importance. The Tracy-Widom distributions characterize the limiting distribution of the top eigenvalue in the null hypothesis case of no structure, a challenge for statisticians since the 1950s. In particular, the distribution function F(2;s) governs the complexvalued data of signal processing.
We remark that Tracy-Widom distributions usually refer to a family of distributions in literature. However, the present paper shall mainly focus on the Tracy-Widom distribution related to GUE matrix model, and so we simply use the terminology Tracy-Widom distribution without confusion. In Subsection 2.1 we shall start by the definition of the Tracy-Widom distribution, denoted by F 2 , in terms of solution to Painlevé II equation, and then briefly review some basic properties like expectation and variance and tail asymptotics, and finally give the Fredholm determinant expression for F 2 . In Subsection 2.2 we address the issue how the F 2 arises by a quick look at Tracy and Widom's original GUE model. The domain around F 2 is rapidly growing.

Tracy-Widom Distribution
To introduce the Tracy-Widom distribution, we need some notations. Let q(x) be the unique solution to the following Painlevé II equation (5) It is not clear from the above formula (5) whether F 2 is a distribution function. But it is, namely Figure 1 displays the density curve of F 2 . Obviously, it is no longer symmetric. As x → ∞, ) .
where ζ stands for zeta function. It is easy to see that the left tail probability decays faster than e −x 2 /2 , while the right tail probability is slower than e −x 2 /2 . We also remark that it takes in contrast to (1) lengthy and laborious effort to obtain the exact constants in the asymptotic expansions (6) and (7). Let X be a random variable distributed as F 2 . Then µ = EX ≈ −1.7710868074, It turns out that F 2 has another Fredholm determinant representation, which is very helpful in manipulation. Let Ai(x) be a solution to the following equation

It can be written as
The extension to the whole complex plane is given by where C denotes the contour in the complex plane consisting of the ray joining e −iπ/3 ∞ to the origin plus the ray joining the origin to e iπ/3 ∞. The Airy function and associated Airy kernel A(x, y), see (8) below, will repeatedly appear in literature. The reader is referred to the book [37] by Vallée and Soares for commonly used properties. Let A(x, y) be the Airy kernel The The sum in the RHS of (9) is convergent by Hadamard's inequality for determinant and the upper bound of Ai(x). The reader is referred to Chapter 3 of the book [1] by Anderson, Guionnet and Zeitouni for a rigorous proof and more information.
It is Tracy and Widom who prove for any x ∈ R, . Indeed, after first taking logarithm and then making second-order derivative at both sides, it reduces to proving [1] for the remaining computation.

How does F 2 arise?
Having F 2 and its basic properties, it is curious where F 2 arises from. It turns out that F 2 was first discovered when Tracy and Widom studied the limiting behaviours of spectral gaps of high dimensional Gaussian unitary ensemble (GUE) in their seminal paper [34] in the 1990s. The study of random matrics went back to 1930s in applied multivariate statistics, and reached its golden epoch in physics in 1950-60s, and entered a rapidly developing new era since the discovery of F 2 .
GUE is a prototype of random matrix models. Let Z ii , i ≥ 1, be a sequence of i.i.d. standard normal random variables, Z ij , 1 ≤ i < j, a sequence of i.i.d. complex standard normal random variables. All of these random variables are defined in a common probability space, and are independent of each other. Let H n = (Z ij ) n×n be an n × n complex conjugate random matrix, where Z ji = Z * ij . It is easy to see Let λ 1 , λ 2 , . . . , λ n be its n eigenvalues. They are almost surely distinct real numbers since H n is a conjugate matrix with continuous entries. The remarkable result due to Weyl is that the distributions of eigenvalues are absolutely continuous with respect to the Lebesgue measure and have joint probability density function As you notice, the density function p n consists of two distinct parts, one is the square of van de Monde determinant, ∏ 1≤i<j≤n |x i − x j | 2 (the power 2 corresponds to the complex conjugate matrix), and the other is the product of independent normal densities, ∏ n k=1 e −x 2 k /2 . So the eigenvalues repel each other and keep a certain gap between.
Let h 0 (x) = 1, h l (x), l ≥ 1 be a sequence of Hermite orthogonal polynomials, namely By a series of transformations of van de Monde determinant, (10) can be written as where K n (x, y) = ∑ n−1 l=0 φ l (x)φ l (y). (11) plays an important role in the study of GUE. In fact, this is a prototype of determinantal point processes coined by Bordoin and Olshanski around 2000. It is easy to derive from (11) any k-dimensional marginal density of (λ 1 , λ 2 , . . ., λ k ): In particular, 1-dimensional probability density is Now one can derive the celebrated Wigner semicircle law for empirical spectral measure from (13) and asymptotic behaviours of Hermite polynomials. For any a < b, . The above semicircle law basically tells that the majority of eigenvalues lies in [−2 √ n, 2 √ n] with high probability. In fact, set then by the concentration inequality for a Gaussian random variable and the Borel-Cantelli lemma, Next, a very interesting and challenging issue: what is the asymptotic distribution of λ (n) around 2 √ n? It follows form (12) where the last term in (14) is a finite series of Fredholm determinant. And again by asymptotic properties of Hermite polynomials, 1 where A(x, y) is Airy kernel given by (8).
In summary, we have sketched out the proof that for x ∈ R, As noticed above, Tracy and Widom [34] only considered a very special case of GUE so that they could apply Weyl's joint probability density representation and nice determinantal structure. How universal is the F 2 distribution? In fact, soon after the work of Tracy-Widom, Baik, Defit and Johansson [2] proved that F 2 is the asymptotic distribution of the lengths of longest increasing subsequences for a random permutation. Specifically speaking, let n ≥ 1, and let S n be a symmetric permutation group of {1, 2, · · · , n}. Given at random a permutation π = ( π(1), π(2), . . . , π(n) ) ∈ S n , define the length of longest increasing subsequences of π by Logan and Shepp [24], Vershik and Kerov [35] almost at the same time in 1977 proved A longstanding problem is what the fluctuation around 2 √ n looks like. It turns out This is so amazing. As reader might notice, there is no link between GUE and random permutations in terms of finite models, but they have a common limit distribution F 2 . This is an encouraging and enlightening result. There have since then been an intensive research activities around the F 2 , and a dozen of apparently distinct models have actually been found to have F 2 as the asymptotic fluctuation law of a certain suitably scaled statistic. Below is a short (absolutely not exhaustive) list of the models, detailed references are readily available in the literature. The interested reader is referred to a survey [33] for model patterns and main results.
We remark that a seemingly big difference trivially exists between the above models. In fact, each individual model has its own feature in either algebraic or analytic structure. There is no unified approach working well with all models.
Deift presented a series of unsolved problems in random matrix and integrable systems in the 2006 conference, where Problem 10 was called a Tracy-Widom central limit theorem, see [3]. We rephrase it as follows. Let . What is the operation f so that the X n 's converges in distribution to F 2 ? Important progress towards answering this question has been made recently, and independently by Baik and Suidan, Bodineau and Martin, but the full problem remains wide open and very challenging. §3 Airy 2 process The Airy 2 process was first introduced by Prähofer and Spohn [27] in the study of random fields, and has attracted a lot of attentions recently. In Subsection 3.1 we shall define the Airy 2 process through finite dimensional distributions and give two alternative representations. In Subsection 3.2 we give the short range correlation due to Prähofer and Spohn [27] , which implies the process is continuous in probability, as well as the long range correlation due to Widom [39], which decays polynomially. In Subsection 3.3 we describe the amazing local Brownian motion behaviors of Airy 2 process. In Subsection 3.4 we explain how the Airy 2 process arise from Dyson's Brownian motions, while in Subsection 3.5 we introduce an extended Airy point process, whose largest particle locations induce the Airy 2 process.

Airy 2 process
The precise definition of Airy 2 process goes as follows. The extended Airy kernel is defined by Obviously, A ext (t, x; t, x ′ ) = A(x, x ′ ). Note by a result due to Okounkov [25], for We remark that the RHS of (16) is a form of normal density function for fixed t, t ′ , and it follows The Airy process A 2 = (A 2 (t), t ∈ R) is defined through its finite dimensional distributions, which are given by a Fredholm determinantal formula: given x 1 , x 2 , · · · , x m and t 1 < t 2 < · · · < t m in R, where we have counting measure on {t 1 , · · · , t m } and Lebesgue measure on R, χ is defined on . We remark that the operator χ 1/2 A ext χ 1/2 in the RHS of (18) is actually a trace class operator on L 2 ({t 1 , · · · , t m } × R), and the determinant is well-defined and equals Trivially, for any t ∈ R, A 2 (t) ∼ F 2 .
The above Fredholm determinant allows another expression. Let H denote the Airy Hamiltonian operator: andP a denote the projection onto the interval (−∞, a]: Indeed, let Then it follows det On the other hand, note T + is an upper triangular matrix, and where we have used the fact that det(1 + PT + ) L 2 (R) = 1. Furthermore, , which is as desired.
Both (18) and (21) are very useful in the exploration of Airy 2 process. The operator in the formula (18) acts on the extended space L 2 ( , while the Fredholm determinant in (21) is computed on the Hilbert space L 2 (R). The latter formula avoids a big difficulty when the number m of times t i goes to infinity. Another advantage of (21) is that it makes apparent that A 2 is a stationary process since the time increments t i − t i+1 appear explicitly in the joint distribution.
In summary, the Airy 2 process (A 2 (t), t ∈ R) is a stationary stochastic process whose f.d.d.'s are given through a Fredholm determinant with extended Airy kernel A ext as in (18), and 1dimensional distribution is F 2 .

Short and Long Range Correlations
In this subsection we look at the short and long range correlation for A 2 (t). Since it is stationary, we focus on Cov(A 2 (0), A 2 (t)) where t > 0.
The short range correlation was first investigated by Prähofer and Spohn [27], in which the Airy 2 process was introduced. Their result is as follows which immediately implies that A 2 (t) is continuous in probability. However, (22) does not directly imply the a.s. continuity of sample paths of A 2 (t), see Section 4.
The long range correlation was first studied by Widom [39], who got the following result: Later on, Shinault and Tracy [32] improved this result to any finite orders.
To see (23), let f 2 (x) = F ′ 2 (x) be the probability density function of GUE Tracy-Widom distribution, and define . By the integration formula by parts and noting F 2 and f 2 exponentially decays at ±∞, it is easy to check In turn, let . So, we turn to analyze the determinant det . Repeatedly applying the integration by part formula, we get For clarity of writing, we introduce the following notations. Denote f ⊗ g(x, y) = f (x)g(y), and let It remains to evaluate traces tr(T 12 T 21 ) and tr(T 12 T 21 ) 2 . To this end, we use the fact (f ⊗ g)(h ⊗ k) = ⟨g, h⟩ L2 f ⊗ k whose trace is ⟨g, h⟩ L2 ⟨f, k⟩ L2 . Shinault and Tracy [32] found that In combination, we have proved (24), and so (23).
Since (A 2 (t), t ∈ R) is a stationary asymptotically uncorrelated stochastic process, then it is ergodic in the sense of mean. Namely, 1 2T As a by-product of (23), (A 2 (t), t ∈ R) is not Markovian. In fact, for a stationary Markov process, its correlation must exponentially decay if it decay.

Local Brownian Motion Phenomena
As observed above, a normal distribution usually arises as a limiting distribution of normalized sums for i.i.d.r.v.'s with finite second-order moments, while the F 2 is typically used to described a new class of extremal phenomena like the largest eigenvalue of Hermite matrices with i.i.d. random entries under certain moment conditions. Brownian motion is a Gaussian process with stationary independent increments (Markovian), while Airy process is a stationary non-Markovian process with 1-dimensional distribution F 2 . At a first glance, they are two totally different objects appearing in different settings. However, as the following result shows, the Airy 2 process behaves locally like a Brownian motion. It is so amazing. How does a normal distribution appear in the extremal phenomena?
Define for ε > 0 and consider the process Z ε = (Z ε (t), t ≥ 0). The so-called local Brownian motion property mathematically reads as follows: The proof of (25) consists of two parts as usual: finite dimensional distributions weakly converges, and Z ε is uniformly tight in C[0, ∞). The uniform tightness of Z ε was first established by Pimentel [26] using the comparison theorem on last passage percolation models. In fact, it now readily follows from the tail probability estimate (40) on modulus of continuity for Airy 2 process.
In the following paragraphs we will briefly outline the basic ideas for finite dimensional convergence due to Hägg [17]. It suffices to show for any m ≥ 1 and t 1 < t 2 < · · · < t m , where y i = x 0 + √ 2εz i . Hence, it is sufficient to prove for any x 0 , x 1 , · · · , x m ∈ R and s 1 , s 2 , · · · , s m > 0, lim δ0,··· ,δm→0 where Let us start with the probability P ( in the denominator of (26).
. Note an elementary calculus gives Thus it follows On the other hand, Here we used the relation (38) and the correlation formula (37) for extended Airy point process in Subsection 3.5, the time line was R 0 . In the sequel we will also use such an approach to moment estimates, the time lines will be clear from the context. For brevity we omit the subscript in ρ.
Hence we have Similarly, sinceÑ 0 ≤ N 0 (N 0 − 1), then it follows Let us turn to the probability in the numerator of (26).

The major term is
where will play a major role in estimating (29).
Analogously to (27) and (28), it follows It follows as δ 0 , · · · , δ m → 0 In combination, we have lim δ0,··· ,δm→0 This is quite technical and lengthy. For an illustration, we only consider a simple case, that is m = 1 and k = 1. Note where by (17) A ext (0, . Hence a simple algebra yields We need to compute the limit of each entry in the matrix of RHS of (31) as ε → 0. Note
In combination, we have ) .
Thus we have completed the proof of (30) in the case of m = k = 1.
To conclude this subsection, we mention a simple result about Brownian bridge.

How does Airy 2 process arise?
The Airy 2 process A 2 (t) was first introduced by Prähofer and Spohn [27] in exploring polynuclear growth processes. It turns out that A 2 (t) has since been found to appear in a dozen of apparently distinct fields. Only recently, it is proved that the directed landscape has KPZ fixed point as its marginal, while the KPZ fixed point has Airy processes as its marginals. We shall give a quick introduction from the eigenvalues of stationary random matrix-valued processes in this subsection.
Let B ij = (B ij (t), t ≥ 0), i, j ≥ 1 be an array of i.i.d. standard real Brownian motions, let (B ij , i, j ≥ 1) an independent copy of (B ij , i, j ≥ 1). Define A ij (t) = B ij (t) + iB ij (t) and then H ij (t) = 1 2 ) . The Hermitian matrix-valued Brownian motion is defined by H n (t) = (H ij (t)) n×n , t ≥ 0 (32) In particular, H n (1) is a well-known GUE. What we are concerned with in the following paragraphs are stationary Ornstein-Ulenbeck matrix valued random processes defined by SDE dM n (t) = −M n (t)dt + dH n (t), t > 0 The stationary distribution associated with M n (t) is GUE. For any finite times t 1 < t 2 < · · · < t m , the joint probability density function for where ρ s,t = e −(t−s)/2 , Z n,m is a normalized constant.
Let λ n (t) = ( λ n,1 (t), λ n,2 (t), · · · , λ n,n (t) ) be the eigenvalues of M n (t) in an increasing order. Then as a result of Itô's formula, λ n (t) is a diffusion process and satisfies the following SDE where W i , 1 ≤ i ≤ n are n independent standard Brownian motions. The corresponding diffusion operator is Furthermore, the equilibrium distribution associated with L is given by This is the joint probability density function of eigenvalues of GUE, which is same up to a constant factor as in (10) The above diffusion process (λ n (t), t ≥ 0) was first introduced by Dyson [13] to describe the diffusion of n mutually repelling particles with positions. Now it is named Dyson's Brownian motion in the literature. According to a classic Karlin-McGregor formula for non-intersecting diffusion process, it follows for s < t ) ) , and so by the Markovian property we have This is a basic example of extended determinantal point processes. Indeed, let X 0 = X m = {0, 1, 2, · · · , n − 1}, X r = R, 1 ≤ r ≤ m − 1, and equip X r with the Lebesgue measure dµ r . Let is the normalized Hermitian polynomials (see (11)), and Then (33) is written as follows It is proved in [20,21] that the probability measure (34) has determinantal correlation function, that is, the probability density with respect to the reference measure dµ r1 (y 1 ) · · · dµ r k (y k ) of finding particles at z 1 = (r 1 , y 1 ), · · · , z k = (r k , y k ) is given by det ( K n,m (z i , z j ) ) 1≤i,j≤k , kernel multiplying with exp(−x 2 /2 + y 2 /2), ··· ,tm}×R) . Now consider the rescaled largest eigenvalue given bỹ Recall the Plancherel-Rotach formula for asymptotics of Hermitian polynomials It follows that for t > s K ext ) ∼ e n 2/3 (t−s) x; s, y). As a consequence, we have so far proved the following λ n,n (t) ⇒ A 2 (t), n → ∞ in the sense of convergence of finite dimensional distributions.

The extended Airy point process
Motivated by the Dyson's Brownian motions, we can give the third representation of Airy 2 process as a marginal process of an extended Airy point process.
Let m ≥ 1 be an arbitrary integer, and let t 1 < t 2 < · · · < t m points in R which we shall think of as times. Define E = R t1 ∪ R t2 ∪ · · · ∪ R tm . R tj is referred to as the time line t j . Assume that X is the space of all locally finite countable configurations of points (or particles) in E. Let Σ be the minimal σ-algebra that contains all cylinder sets. The so-called extended Airy point process is such a measure on the space (E, Σ) that its k-point (1 ≤ k ≤ m) correlation function has the extended Airy kernel A ext , see (15).
In particular, let z 1 = (t r1 , x 1 ), · · · , z k = (t r k , x k ) be k points from E, where r 1 , · · · , r k are possibly same, then It is a well-known fact that at each time line R ti there is almost surely a largest particle λ(t i ). Moreover, it follows ) .
(38) The determinatal point process has become a very hot object of study in probability theory since Borodin and Olshanski coined around 2000. Interestingly, both Poisson processes and Brownian motions are determinantal point processes. §4 Sample Path Properties

Modulus of Continuity for Airy 2 process
As we observed, the determinantal formula (18) and its equivalent expression (21) are very useful in the definition of Airy 2 process and in the study of long range correlations, stationarity, and local Brownian motion behaviours. However, it is difficult to derive even the most basic path properties, such as continuity, from it directly, see Prähofer and Spohn [27]. There has been a good advance in the exploration of sample path properties recently. In particular, a useful technique, called the Brownian Gibbs property, was developed by [6-8, 10, 11].
The main result of this subsection is the modulus of continuity for Airy 2 process. It reads as follows.
Proceed with the proof of (41). We need to introduce the concept of a line ensemble and the Brownian Gibbs property.
A line ensemble L = (L(i, t), i ≥ 1, t ∈ R) is a collection of random continuous curves indexed by N, each L(i, ·) of which maps R into R. If for all i < j, L(i, t) > L(j, t) for all t ∈ R, then L is said to be a non-intersecting line ensemble.
. For any k 1 < k 2 and any a < b, is called the Airy line ensemble if its finite dimensional distributions are given by the extended Airy point process (see Subsection 3.5). Corwin and Hammond [6] proved the existence of the Airy line ensemble: There exists a unique continuous non-intersecting N-indexed line ensemble with finite dimensional distributions given by the extended Airy point process. Moreover, they also proved that the N-indexed line ensemble L given by L(i, t) = 2 −1/2 (R(i, t) − t 2 ) for each i ∈ N has the Brownian Gibbs property. Some authors refer to L as the parabolic Airy line ensemble. Trivially, R(1, t) = A 2 (t).
To prove (41), we may and do assume x > 7δ 3/2 , otherwise, just change the values of constants.
Then it follows is a standard Brownian bridge and B 0 (u) =B 0 (u) + y(u).
On the other hand, we have P ( ) .

Note it follows
where in the last inequality we used the fact b > 8δ 2 . It is known that there exist c 13 , c 14 such that for any x > c 14 P ( (45) We refer to Dauvergne and Virág [10] for the proofs of (44) and (45). Putting these together concludes the proof of (41).
As a corollary, we have for any x > c 17 We remark that the proof of (39) do depend on the intricacies of the jump ensemble method, a method that exploits the Brownian Gibbs property of the parabolic Airy line ensemble in order to make inferences about its curves, notably the ones at its edge, such as A 2 after parabolic shift. As Hammond pointed out (private communication): the method may seem complex and indirect, though it is in its fundamentals attractively probabilistic and geometric. I am not aware that the result can be proved by viewing the Airy process in isolation. It is by embedding it in the richer Airy line ensemble that this valuable Brownian Gibbs method becomes available.

Maximum of Airy 2 process
In this subsection, we turn to the distribution of the maximum of A 2 (t) over the real line R. Define As an analogue of (3), we have the following remarkable result: P (M ≤ x) = F 1 (4 1/3 x), x ∈ R (46) Here F 1 is the limiting distribution of the largest eigenvalue of Gaussian Orthogonal Ensembles (GOE), and has a close link with F 2 through The formula (46) was first conjectured by Johansson [20], which explored the discrete growth models like polynuclear growth models (PNG) and last passage percolation (LPP) times in the planar lattice. Indeed, F 2 and F 1 are the limiting distributions of point-to-point LPP times and point-to-line LPP times after properly scaled, respectively, see Johansson [20], Prähofer and Spohn [27] for detailed information. on the other hand, the point-to-line LPP times can be simply computed as the maximum of the point-to-point LPP times. Thus (46) can be proved, though in a very indirect way.
A direct proof was later provided by Corwin, Quastel and Remenik [8]. It largely relies on the following two fundamental facts, which are of interest in their own right. The first one is the determinantal representation of F 1 obtained by Ferrari and Spohn [15]: The second fact is, if letting H be as in (19), R a reflection operator with kernel given by and defining Θ T =P x+T 2 (e −2T H − R T )P x+T 2 , then it follows P ( sup Note e T H R T e T H = ϱ x , where ϱ x is the reflection operator, i.e., ϱ x f (u) = f (2x − u). Hence . Finally, to see the identity (46), we need only to note A = B 0 P 0 B 0 and B 0 ϱ x B 0 (u, v) = 2 −1/3 Ai(2 −1/3 (u + v + 2x)). The reader is referred to the survey on Airy 2 processes and variational problems by Quastel and Remenik [29] for explicit derivation.
Proceed with T . The study of T has received quite a bit of recent interest in the physics literature since it is believed that the distribution of T is universal so that it governs the rescaled endpoint of directed polymers in 1+1 dimensions for large time or temperature. Flores, Quastel and Remenik [14] obtained an explicit expression for the distribution of T . Later on, Quastel and Remenik [30] start from a probabilistic argument and use the continuum statistics formula (47) to establish the following tail decay: There exists a c 20 such that for every c 21 > 32 3 and large enough t e −c20t 3 ≤ P (|T | > t) ≤ c 21 e (−4/3)t 3 +2t 2 +O(t 3/2 ) . (48) The order e −ct 3 confirms a prediction made in the physics literature. It is believed that the −4/3 in the upper bound is the correct exponent. In particular, it is open to show log P (|T | > t) The key step for the proof of (48) is to prove that there exists a constant c T > 0 for each T ≥ 1 such that for every x > 0

Concluding Remarks
We have restricted ourselves to the Airy 2 process A 2 with its sample paths. As seen above, A 2 is a comparatively new stationary non-Markovian continuous random process, and is very worthy of further attention. In addition to the Airy 2 process A 2 , there have appeared other new processes very alike, say A 1 , A stat , A 2 →1 , A 2 →BM , A 1 →BM in the recent study around