Amalgamated Free Lévy Processes as Limits of Sample Covariance Matrices

We prove the existence of joint limiting spectral distributions for families of random sample covariance matrices modeled on fluctuations of discretized Lévy processes. These models were first considered in applications of random matrix theory to financial data, where datasets exhibit both strong multicollinearity and non-normality. When the underlying Lévy process is non-Gaussian, we show that the limiting spectral distributions are distinct from Marčenko–Pastur. In the context of operator-valued free probability, it is shown that the algebras generated by these families are asymptotically free with amalgamation over the diagonal subalgebra. This framework is used to construct operator-valued ∗\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^*$$\end{document}-probability spaces, where the limits of sample covariance matrices play the role of non-commutative Lévy processes whose increments are free with amalgamation.

matrix of features is given by The bulk behavior of such a matrix is studied through its empirical spectral distribution (ESD), the point-mass probability measure on its spectrum. In practice, many datasets exhibit strong multicollinearity when p and N are comparably large. In this scenario, p/N ∼ O (1), and we choose to model X as a large random matrix with stable rectangular shape. In what follows, we implicitly take N = N ( p) as a function of the asymptotic parameter p ∈ N.

Definition 1
We say that a sequence of N × p random matrices X p is a λ-shaped ensemble if the collection of entries [X p ] i, j for p ∈ N, 1 ≤ i ≤ N , and 1 ≤ j ≤ p are jointly independent, and if N / p → λ ∈ (0, ∞) as p → ∞.
Describing the limiting spectral properties of matrices like 1 N X p X p and its variants is a long-standing problem in random matrix theory. We say that a sequence of square random matrices has a limiting spectral distribution if their (random) ESDs converge weakly to some probability measure almost surely. The existence of a limiting spectral distribution in the pure-noise case, where the entries [X p ] i, j are i.i.d. with finite variance, was initiated by Marčenko and Pastur [17]. The criteria on X p that ensures 1 N X † p X p follows the Marčenko-Pastur law have a long history [3,11,21,25], with one branch culminating in the generous conditions of Tao and Vu [23] that the collection of entries across the asymptotic parameter p shares some uniformly bounded (2 + )-moment.
As the conditions for the Marčenko-Pastur law continued to weaken, its "universality" inspired a number of covariance matrix cleaning techniques, not the least of which were applied to financial data [2,10,13]. As the motivation behind these techniques was the shape and bounds on the Marčenko-Pastur law, it was suggested that real datasets would exhibit this bulk shape when some volatility in the data could be attributed to noise that was approximately Gaussian-like. As financial data are ubiquitously non-Gaussian, however, efforts were made to extend the law to the heavier-tailed setting. Along these lines, Biroli et. al. [8] investigated an ensemble of random matrices with Student's t column norms, while Guionnet and collaborators began a program of limit theorems for matrices with i.i.d. heavy-tailed entries [4,5,7,9].
In [28], it was shown that intraday equity data in various markets fail to match the scalability implied by Marčenko-Pastur with distinct values of λ. Appealing to the heavy-tailed setting does not resolve the issue: The heavy-tailed pure-noise Marčenko-Pastur law described in [4] produces large eigenvalues with heavy tails themselves, which contradicts well-observed phenomena in extreme asset returns [16,18]. In order to address these concerns, the author considered instead a sequence of random matrices X p whose columns are drawn from the fluctuations in a stochastic process X t over a fixed interval [0, T ]. Specifically, after discretizing the interval [0, T ] into a series of N + 1 points t i = i · T N with 0 ≤ i ≤ N , we let the entries [X p ] i, j follow the distributions: In this way, each column of X p is understood to represent the fluctuations of an independent copy of the process X t over [0, T ].
If we continue to impose the condition that entries of X p are i.i.d. for each p, then X t must be a stochastic process with independent and time-invariant increments. Such properties specify that X t is a Lévy process, and the entries of X p will therefore follow an infinitely divisible distribution. This correspondence leads naturally to the following model.

Definition 2 (Sample Lévy Covariance Ensemble)
Let (μ, λ) be a pair consisting of an infinitely divisible distribution μ ∈ ID( * ) and a shape parameter λ ∈ (0, ∞). A sample Lévy covariance ensemble (SLCE) C p driven by data (μ, λ) is a sequence of p× p Wishart-type random matrices C p = X † p X p , where X p is a λ-shaped rectangular ensemble whose entries follow the distributions: The main contribution of this paper is the spectral convergence of SLCE matrices in Theorem 1, which extends Lemma 2 introduced in [28] to cover the case of an SLCE driven by an arbitrary Lévy process.
Our model intersects with the world of non-commutative probability in the following way. A trivial but far reaching property of sample covariance matrices is that they can be decomposed in terms of blocks of their observations. Writing . . .
where X p,k consists of rows N k−1 K + 1 to N k K , we then have Each C p,k is an SLCE with parameters (μ, λ/K ). The expression of C p as an independent sum of matrices with identical limiting spectral distributions parallels the classical case of infinite divisibility. The recent result of Au et. al. [1], connecting operator-valued free probability and permutation-invariant random matrices, makes this precise: The matrices C p,k in decomposition (1) are independent and permutation invariant and therefore asymptotically free with amalgamation over the subalgebra of diagonal matrices. Prior to this result, techniques from free probability were typically restricted to unitarily invariant ensembles of random matrices. This development provides a rich framework for us to understand C p as asymptotically modeling a noncommutative Lévy process in an operator-valued * -algebra.
The outline of the article is as follows. In Sect. 2, we sketch the foundations of Lévy processes in order to establish the decomposition in Lemma 1. The full reference for this section is the treaty by Sato [19]. The proof of Theorem 1 is provided at the end of Sect. 3, followed by some immediate corollaries. In Sect. 5, we use the SLCE to construct an operator-valued * -algebra and an accompanying -free Lévy process with prescribed moments.
One might ask where these results fit into the larger context of random covariance matrices. Our ensembles lie outside the domain of attraction for Marčenko-Pastur (and the analogous heavy-tailed case [4]) because their entries are independent but not identically distributed across the asymptotic parameter p. Because of the increasing roughness of the entries, the matrices fail to meet the conditions of [23] and others. On the other hand, they fit well into the world of covariance matrices with exploding moments [6,14]. In these works, the i.i.d. entries of the data X p have normalized even moments with some prescribed behavior, following the covariance form of the Zakharevich condition [27]. Under our normalization, this condition is equivalent to limits of the form for some even sequence c 2n . From the proof of Theorem 1, we have the following: If c 2n ∈ [0, ∞] is the sequence of even cumulants of an infinitely divisible probability distribution, then it can be realized as the Zakharevich sequence of an ensemble of random covariance matrices. We note that this includes sequences of even moments of arbitrary probability distributions, through the moment-cumulant correspondence given by compound Poisson processes.

Decomposition of Lévy Processes
Throughout, if X is a real-valued random variable, then we write L (X ) for the law of X , a probability distribution on R. Equality in distribution X d = Y is shorthand for equality in law, L (X ) = L (Y ).

Definition 3 A (classical)
Lévy process X t is a stochastically continuous càdlàg process such that Definition 4 A probability distribution μ is said to be (classically) infinitely divisible, μ ∈ ID( * ), if for any n ∈ N, there exists a distribution denoted by μ * 1/n such that Here, the symbol * stands for the additive convolution of probability measures. Similarly, a random variable X is said to be ID( * ) if for any n ∈ N, we can write

Theorem 3 (Lévy-Khintchine representation [19])
There is a one-to-one correspondence between ID( * ) distributions and Lévy processes, such that each μ ∈ ID( * ) can be realized as the distribution of a Lévy process X t at unit time t = 1. Furthermore, the cumulant generating function for a Lévy process X t is well defined as a continuous function of θ ∈ R and has a representation given by The unique triplet (a, b, ), called the data of X t , consists of constants a ∈ R and b ≥ 0, and a nonnegative Borel measure on R with no atom at zero, such that for any > 0 (or, equivalently, for only = 1), we have The Borel measure is called the Lévy measure of X t . We write μ * t = L (X t ), well defined for all t ≥ 0.
We recall that the cumulants κ n are defined in terms of the moment-cumulant formula, which states that for a random variable X with finite moments m j [X ] = E[X j ] up to order n, they are the unique values κ j [X ] such that the following n equations are satisfied: Here, each sum runs over all partitions π of the sets {1, 2, . . . , j} and the elements B ∈ π are subsets of {1, 2, . . . , j}.

Corollary 1
The cumulants κ n [X t ], when they are finite, are given by the expressions for the set of all essentially bounded probability distributions.
Sato [19] provides a thorough account of essentially bounded Lévy processes. If the support of is contained in (−∞, B] for some minimal B ≥ 0, then the superexponential moments are finite for all 0 < β < 1/B and infinite for all β > 1/B, independent of t > 0. The cumulant generating function ψ X t (θ ) can also be extended to an entire function in the argument θ ∈ C when X t is essentially bounded. The key property of essentially bounded functions is that they are precisely those Lévy processes cumulants whose size grows at most exponentially, such that the sequence The following lemma shows that all Lévy processes can be decomposed into the independent sum of an essentially bounded process and a compound Poisson process with arbitrarily small probability of activation. Recall that a compound Poisson process is a Lévy process realized by the random sum where ζ j are i.i.d. random variables drawn from some fixed distribution, and N t is a standard Poisson process which is independent of the ζ j . The value r > 0 is called the rate of P t , and when N rt = 0 (the sum is empty) we say that P t failed to activate. We note that the probability of this event is equal to Compound Poisson processes have a convenient description in terms of their jumps ζ j : Their cumulant generating functions are given by where E[e iθζ 1 ] is the characteristic function of the jump distribution ζ j .
Lemma 1 Let X t be a Lévy process, and let r > 0 be a fixed constant. Then, there exists a decomposition

is an essentially bounded Lévy process, and P t is an independent compound Poisson process with rate less than or equal to r .
Proof Let (a, b, ) be the data for the Lévy process where X b t is an essentially bounded Lévy process with data (a, b, b ), and P t is a Lévy process with data (0, 0, P ). If p (R) > 0, then P t has cumulant generating function This is precisely the form of a compound Poisson process with rate P (R) < r and jump distribution given by the probability distribution P (R) −1 P (·).
Proof We aim to show that the λ-shaped rectangular ensemble X p appearing in the definition of the SLCE matrices C p can be scaled in order to satisfy the Zakharevich condition found in Benaych-Georges and Cabanal-Duvillard [6, Theorem 3.2]. Specifically, we set We let X t denote a Lévy process such that μ = L (X 1 ). Then, the entries of Y p follow the distribution of √ pX 1/ p .
To show convergence of covariance matrices, it is sufficient to consider the Zakharevich condition on the even moments only: For n ∈ N fixed, the term on the right is simply the 2n th moment of X 1/ p . By the moment-cumulant formula (3), this moment can be expressed as a sum of products of the form 2n j=1 Therefore, we can write (6) as a sum of terms which look like (7) guarantees that 1− 2n j=1 k j ≤ 0. The terms for which 1− 2n j=1 k j < 0 will converge to zero as p → ∞. There is only one term such that 1 − 2n j=1 k j = 0, which is k j = 0 for all except k 2n = 1. Therefore, as p → ∞ we have Since X t is essentially bounded, κ 2n [X 1 ] 1/2n is bounded, and the conditions of [6, Theorem 3.2] are met. We denote the limiting distribution by λ (μ), as it only depends on λ and the even cumulants of the distribution μ. Continuity in the parameter λ follows similarly from the same reference. For sequential continuity of a collection of essentially bounded processes X ( j) t , we have that the cumulants κ 2n X ( j) 1 each converge for fixed n ∈ N. By the continuity in the c term, we have continuity of the limiting distribution as desired.
Proof of Theorem 1. As above, we take a Lévy process X t such that [X p ] i, j d = X 1/ p . Our goal is to decompose the matrices X p into the sum of two independent components, one of which is driven by an essentially bounded process and another one is low rank with high probability. By Lemma 1, we have a decomposition into the sum of independent processes where X b t is an essentially bounded process and P t is a compound Poisson process with arbitrarily small rate r > 0. Therefore, we can write each matrix X p as X p = X p + P p where the entries of X p are i.i.d. following the distribution X b 1/ p and the entries of P p are i.i.d. following the distribution of P 1/ p . Note that this equality is not simply in distribution, as we treat the entries of X p as being generated by summing the independent entries of X p and P p . It follows that rank X † p X p − X † p X p ≤ 2 · Number of columns of X p that are different from X p ≤ 2 · Number of columns of P p that are nonzero Each column of P p has p independent compound Poisson entries, which each fail to activate with probability e −r / p . The probability that all N entries fail to activate is P A specified column of P p is all zero ≥ e −r / p N = e −r λ This allows us to treat the number of columns of X p that are different from X p as being bounded above by a multiple of a Bernoulli random variable with p trials and probability of success q ≤ 1 − e −r λ . Using the Chernoff bound on Bernoulli trials, we get that occurs only finitely many times almost surely. Let > 0 be given, and choose r > 0 such that 4 1 − e −r λ < . For this choice of r > 0, almost surely for large enough p. By Lemma 2, we know that X † p X p has a limiting spectral distribution, which we will denote by μ . Since this can be done for any > 0, it follows by from the lemma of Benaych-Georges and Cabanal-Duvillard [6, Lemma 12.2] that a limiting distribution exists for X † p X p as well and is given by the weak limit of lim →0 + μ .
Continuity in the arguments can be derived from the continuity in the case of essentially bounded processes. Let (μ ( j) , λ j ) be a sequence of data converging to some (μ, λ), where we take μ ( j) → μ in distribution. If we let ( j) be the Lévy measures of μ ( j) , it follows [19] that for any fixed B > 1 Therefore, if > 0 is given and some r > 0 is chosen as above, a B > 1 can be chosen uniformly across all j ∈ N. To see this, take some B 1 > 1 as in Lemma 1 so that Since (9) holds, there is some J ∈ N such that j > J implies that  By (2), a Lévy process is symmetric precisely when is. It follows that every Lévy process can be symmetrized by considering a new process whose Lévy measure is given by for any Borel set A ⊆ R which does not contain a neighborhood of zero, where −A = {x ∈ R : −x ∈ A}. Both the process X t and the resulting symmetric process X s t can be simultaneously approximated in distribution by essentially bounded Lévy processes with identical even cumulants. As the limiting spectral distribution of the SLCE is independent of the odd cumulants of the original process, it is invariant under the operation of symmetrization.

Corollary 3 Suppose μ, ν ∈ ID( * ) are infinitely divisible with Lévy measures μ and
ν . If s μ = s ν , then λ (μ) = λ (ν) for all λ ∈ (0, ∞). Proof In the proof of Lemma 2, the limiting distribution relies only on the even cumulants of the essentially bounded process. As in the proof of Theorem 1, we take Lévy processes X t and Y t such that L (X 1 ) = μ and L (Y 1 ) = ν. The decomposition of both process X t and Y t into the independent sums X b t + P t and Y b t + P t relies on truncating the Lévy measures μ and ν on sets [−B, B]. By (4), the even cumulants of the essentially bounded components are identical under the stated condition after choosing B > 0 for the two processes simultaneously. Therefore, the limiting distributions for matrices X † p X p and Y † p Y p are both equal to some μ . Since the limiting distributions for both are weak limits of μ , they are equal.
By equality of the nonzero eigenvalues of X † p X p and X p X † p , we have immediately that for λ ≥ 1 The parallels to the Marčenko-Pastur law are clear. The following corollary shows a similar correspondence for a recent result of Shlyakhtenko and Tao [20] having to do with the interpretation of the (scalar-valued) free semigroup μ t in terms of unitarily invariant minors of large matrices.
Corollary 4 Let C p be an SLCE with data (μ, λ), with X p as in Definition 2. For k ∈ [1, ∞), let p p and p p denote sequences of p × p and N × N random diagonal matrices, respectively, whose diagonal entries are in {0, 1} and such that Then, X † p p p X p has a limiting spectral distribution λ/k (μ) and p p C p p p has a limiting spectral distribution

Amalgamated Free Lévy Processes
Definition 6 In this article, we define an operator-valued * -probability space (sometimes called an algebraic probability space, see [1,24]) as a collection of data (A , τ, D, ) such that 1. A is a * -algebra. We take D x 1 , . . . , x n to be the algebra generated by D and the elements x i ∈ A . We say that the (univariate) distribution of an element x ∈ A is the collection of multi-linear maps Definition 8 A -free Lévy process x t on an operator-valued * -probability space We take C x to be the space of formal linear combinations of -monomials in x, the so-called -polynomials. This is a * -algebra in the natural way: The product of monomials is concatenation of words and bracketed words, and involution on monomials is the expression of words and bracketed words in reverse order. Note that in the construction of our algebra C x , we are implicitly assuming that the x j are self-adjoint. When the indeterminants x are clear, we write D ⊆ C x for the subalgebra of elements w such that [w] = w. Similarly, D x j is the subalgebra generated by bracketed words and the indeterminant x j . Elements of D x j are precisely linear combinations of alternating monomials of the form where w l ∈ C x and r l ∈ N.
In the context of p × p matrices, we write [A] for the diagonal of A, that is the matrix such that Let c = (C j ) j∈J be a family of p × p matrices. If q ∈ C x is a -polynomial on the same index set J , then we let q(c) denote the p × p matrix formed by linear combinations of matrix products and applications of matrix diagonalization. Lemma 3 (Asymptotic -Freeness of SLCE Families) Let c p = (C (1) p , . . . , C (K ) p ) denote a family of K independent SLCE C (k) p with data (μ k , λ k ), where μ k ∈ ID b ( * ) are essentially bounded. If q ∈ C x is a -polynomial in the indeterminants x = (x 1 , . . . x K ), then the following limit exists and is finite: Furthermore, let q m ∈ C x for m = 1, . . . , M be a collection of -polynomials, and let p be a sequence of p × p diagonal matrices of the form for p as above.
We mention that our use of [14] in order to show the existence of the limit for apolynomial follows from the larger theory of graph operations on families of random matrices, detailed fully in the recent monograph [15]. As the proof of [14,Theorem 2.3] encompasses the so-called graph monomials, the inclusion C x ⊂ C x ⊂ CG x shows that the limit exists for -polynomials as well.
Proof For each pair (μ k , λ k ), we let C (k) p denote an SLCE such that the entries of each X (k) p are jointly independent of all others across the family. We write c p = (C (1) p , . . . , C (K ) p ) for the K -tuple of p × p self-adjoint random matrices. Let A 0 = C x be the space of -polynomials in the indeterminates x = {x k } k=1,...,K , as above. We define a tracial state τ on the -polynomials of A 0 in the following way: This limit exists by Lemma 3. Now, consider Z τ = {q ∈ A 0 : τ [q * q] = 0}. By Cauchy-Schwarz on τ , we have that for n > 1 τ [(q * q) n ] = τ [ (q * q) n−1 q * q] ≤ τ [q(qq * ) n−1 (q * q) n−1 q * ] τ [q * q] which shows that τ [(q * q) n ] = 0 for all n ∈ N. Now, if q ∈ Z τ and r ∈ A 0 , we have A similar argument on the right shows that Z τ is a two-sided ideal in A 0 , and so let A = A 0 Z τ . Take D to be the set of equivalence classes of elements from D ⊆ C x , this is to say the elements q ∈ A such that [q] = q. Our data (A , τ, D, ) are an operator-valued * -probability space following from properties of the matrix trace, the matrix diagonal, and (10). We now consider the monomials q k (x) = x k . These are self-adjoint, and their classical moments can be computed via the expressions By the weak convergence of the ESD of an SLCE in Theorem 1, this equals the n th moment of λ k (μ k ). The existence of exponential moments of λ k (μ k ) from Corollary 2 guarantees that it is completely determined by its moment sequence, so we have L (x k ) = λ k (μ k ).
To see that the elements x k are -free, let q m ∈ A for m = 1, . . . , M such that q m ∈ D x i m with i m = i m+1 for each 1 ≤ m ≤ M − 1. These q m are precisely those in the equivalence classes of elements D x i m . Setting  where theĈ (k) p are independent and permutation-invariant p × p elements of an SLCE with data (μ, t k − t k−1 ). Applying the asymptotic -freeness to theseĈ (k) p as in Theorem 4, it follows that the difference elements x t k − x t k−1 ∈ A are -free.