Beyond islands: a free probabilistic approach

We give a free probabilistic proposal to compute the fine-grained radiation entropy for an arbitrary bulk radiation state, in the context of the Penington-Shenker-Stanford-Yang (PSSY) model where the gravitational path integral can be implemented with full control. We observe that the replica trick gravitational path integral is combinatorially matching the free multiplicative convolution between the spectra of the gravitational sector and the matter sector respectively. The convolution formula computes the radiation entropy accurately even in cases when the island formula fails to apply. It also helps to justify this gravitational replica trick as a soluble Hausdorff moment problem. We then work out how the free convolution formula can be evaluated using free harmonic analysis, which also gives a new free probabilistic treatment of resolving the separable sample covariance matrix spectrum. The free convolution formula suggests that the quantum information encoded in competing quantum extremal surfaces can be modelled as free random variables in a finite von Neumann algebra. Using the close tie between free probability and random matrix theory, we show that the PSSY model can be described as a random matrix model that is essentially a generalization of Page’s model. It is then manifest that the island formula is only applicable when the convolution factorizes in regimes characterized by the one-shot entropies. We further show that the convolution formula can be reorganized to a generalized entropy formula in terms of the relative entropy.


Introduction
Recent advances in the black hole information puzzle [1,2] feature a quantitative characterisation of the entropy of Hawking radiation that is consistent with unitarity [3][4][5][6][7][8]. The radiation entropy is shown to follow the Page curve and it resolves the entropic information puzzle, 1 which we refer to as the tension between an ever-increasing radiation entropy computed by Hawking, and the entropy following the Page curve which stops increasing after the Page time as demanded by unitarity [10,11]. The entropy is computed using the gravitational path integral (GPI) implementing the replica trick [12][13][14]. In semiclassical gravity, the GPI can be evaluated under the saddle-point approximation, and the key insight is that a wormhole saddle connecting the replicas is generically dominant for an old evaporating black hole. The resulting entropy of Hawking radiation is described by the island formula, This formula equates the radiation entropy to the generalized entropy evaluated at the island, ∂I, in the black hole spacetime. The surface that bounds the island is identified via an extremisation procedure and is known as the quantum extremal surface (QES) [15]. In particular, it identifies an island I inside the black hole after the Page time, and this allows the resulting radiation entropy S(R to stop increasing and hence resolving the paradox. Some fineprints are needed to interpret this formula. The entropy that appears on the right is the entropy in the semiclassical theory. It is the von Neumann entropy of the quantum state reduced on the joint region I ∪ R and we shall refer it as the state in the effective description, or simply as the bulk state if we borrow the language of holographic duality. Then the island formula is believed to capture the radiation entropy, as would be obtained from the von Neumann entropy of a density operator computed directly in a complete theory of quantum gravity. We shall refer to such a density operator as the fundamental description of the radiation, or as the boundary state again using the analogy of AdS/CFT. We add a tilde (e.g.R) in the fundamental description to help distinguish it from the effective description. In general, we cannot deduce the fundamental description in full from the replica trick calculation and the best we can hope for is to extract the spectrum of its density operator.
However, the island formula is not guaranteed to hold in all circumstances. If one examines carefully its derivation [6,7,13,14], an important step is to assume that the replica-symmetric (i.e. Z n -symmetric) saddles always dominate over the non-symmetric ones, such as the fully disconnected Hawking saddle before the Page time and the fully connected replica wormhole saddle after the Page time. When the non-replica-symmetric saddles aggregate a significant contribution comparable to the contributions from the replica-symmetric saddles, this assumption becomes invalid and replica symmetry breaking kicks in.
This is a fair simplification as far as demonstrating the qualitative features of the Page curve is concerned. However, it is recently argued and shown that the island formula doesn't compute the radiation entropy accurately when the effect of replica symmetry breaking is significant, which could occur in presence of a non-flat entanglement spectrum between the radiation and the black hole [16,17]. In such cases, the quantum extremal surfaces and islands cannot be identified. This is well-expected because when the GPI is not dominated by a single geometry, so the entropy cannot be evaluated using a single semiclassical geometry when a superposition of distinct geometries is relevant.
The investigation of replica symmetry breaking in the context of black hole information problem is largely unexplored, mostly because the non-replica-symmetric saddles cannot be analysed with good control in the GPI. One exception is the toy model due to Penington-Shenker-Stanford-Yang (PSSY) [6]. It consists of a black hole in JT gravity coupled with a static end of world (EOW) brane, that carries internal degrees of freedom, modelling the black hole interior. It is perhaps the simplest setup which mimics an evaporating black hole such that the entropic information puzzle is still manifest. The upshot of the simplification is that one can make sense of summing over all saddles with good control. It is thus possible to carry out detailed quantitative analysis probing replica-symmetry, which could be challenging in other models.

Problem: The PSSY model with non-flat entanglement spectrum
Our goal in this work is to explicitly compute in the PSSY model the spectrum of the radiation state in the fundamental description, given any bulk entanglement spectrum, and thus the von Neumann or Rényi entropies of the radiation. Non-flat spectrum is relevant when the we apply quantum information processing to the radiation or when the reservoir is ruled by some interesting Hamiltonian. The bottom line is that it is a more general setting to study the entropic information puzzle that was not thoroughly investigated. We shall see that this slight generalization offers some insights on the replica trick and the information puzzle.
We start by briefly reviewing the PSSY model. The black hole is described in JT gravity, which is a 2D theory of gravity that couples to a dilaton field φ. Concretely, we work with the Euclidean JT action [6,7,18], where S 0 is the extremal entropy of the black hole solution, i.e., the black hole entropy in the zero temperature limit. The total action is appended by the addition term describing a static brane that sits at the end of the world, where µ denotes the tension of the EOW brane. The EOW brane carries internal degrees of freedoms that are entangled with the Hawking radiation in an auxiliary k-dimensional reservoir system outside the spacetime (cf. Fig. 1a). Specifically, the scenario studied by PSSY concerns the maximally entangled bulk state in the effective description: where |ψ i R is the brane state acting as the Hawking partner that purifies the i th radiation mode |i R . 2 Here the Schmidt coefficients are chosen to be flat with the Schmidt rank k. The model does 2 In PSSY equation (2.5), they write down the state |ρ BR := 1 √ k k i=1 |ψ i B |i R to motivate the boundary conditions for the GPI. We put a tilde to indicate that, in our terms, |ρ BR is the state in the fundamental description, where |ψ i B is the black hole state when the brane is in state i. They have overlap of O(e −S BH ) such that S(R) follows the Page curve. Here we specifically refer to the state in the effective description. It is written in Schmidt decomposition where the brane states |ψ i R are strictly orthogonal. ′ (a) Figure 1: The PSSY model. The setup is sketched in Fig. 1a, where the EOW brane-radiation bipartite system is labeled as R R and the black hole-radiation system is labeled as BR. We shall consider arbitrary pure bipartite state on R R (effective description), denoted as the dashed line, and aim to compute the entanglement spectrum between BR (fundamental description). This calculation is achieved using the replica trick that concerns for instance three replicas of boundary conditions as depicted in Fig. 1b. The EOW brane indices are contracted by the 3-swap operator η 3 = δ j1i2 δ j2i3 δ j3i1 . The dominant replica-symmetric saddles in the GPI are the fully disconnected (Hawking) saddle (Fig. 1c) and the fully connected wormhole saddle (Fig. 1d).
not describe a dynamical evaporation process. With the entanglement between the radiation and the black hole put in by hand, the PSSY model should be viewed as a snapshot of a physical black hole that is evaporating quasi-statically. To see the evolution of the radiation entropy and the Page curve, one simply tune up k. Regardless of its simplicity, the entropic information puzzle is still manifest as the radiation entropy seems to increase without bound as we tune up the bulk entanglement.
The island formula in the PSSY model is simply, S(R) = min{S BH , log k} . (1.5) where S BH is the Bekenstein-Hawking (BH) entropy. 3 According to the formula, the Page curve has two regimes with log k < S BH and log k > S BH . In the former case, the QES surface is the empty surface so we have no area term but only a bulk entropy term with contribution log k; and in the latter case, the QES surface sits at the horizon and we have no bulk entropy contribution S(I ∪ R) = 0 but only a constant area term S BH . The island formula therefore provides us a sketch of the Page curve for an eternal black hole as we tune up k, which increases with k until it flattens out for log k > S BH . It's very accurate in the microcanonical ensemble and subject to subleading 1/β correction near the Page transition in the canonical ensemble of inverse temperature β.
To see how this formula comes from, we need use the replica trick. The idea is to compute the moments, Trρ n R , of a hypothetical radiation density matrixρ R . Since we do not have such a density matrix a priori, we compute the moments by preparing replicas of the black hole-radiation system, and evaluate the expectation value of the n-swap operator η n using the GPI. The rule is to integrate over all the configurations compatible with the boundary conditions that prepares the state and imposes the observable η n . Since the full path integral is challenging, we instead sum over all saddlepoint configurations to approximate the GPI. This quantity that the GPI computes can be thought of as a partition functionZ n = η n with the replica trick boundary condition.
The boundary condition is illustrated for three replicas in Fig. 1b. It consists of three copies of a asymptotic boundary (solid line) of length β that prepares the JT black hole at temperature 1/β and the end points are joined by the sources that create the EOW brane. They carry internal degrees of freedom indicated by the dangling dashed lines. As labeled, the indices indicate the matrix elements of the radiation density operator that we'd like to prepare. The dashed lines are connected cyclically corresponding to evaluating the expectation value η 3 . When the EOW branes form in a saddle geometry that fills into this boundary condition, the the dashed lines will extend along the branes as the internal degrees of freedom are physically residing on the branes. So they form in loops as depicted in Fig. 1c. We refer them as the EOW brane loops.
We work in the planar limit which means that we ignore topologies without crossings or higher genus. This is legit when we work in the regime S BH , log k 1 as the nonplanar geometries incur additional factors of order O(e −SBH ), O(1/k) in the on-shell action. Such planar saddles are organized by non-crossing (NC) partitions. (cf. Definition 4) From each saddle, there are two contribution to the partition functionZ n : One from the gravitational sector, denoted as the gravitational partition function Z n from an n-connected disk, i.e. a replica wormhole connecting n boundaries; and the other from the matter sector, where a length-n EOW brane loop gives a contribution of Tr(kρ R ) n , which always evaluates to k for the reduced state ρ R of (1.4).
A quick calculation shows that when log k < S BH (log k > S BH ), the fully disconnected (connected) dominates yielding the island formula (1.5). Note that we've ignored the non-replica-symmetric saddles for the island formula, but they can be counted in and it would result in a correction of order 1/ √ β near the transition. Nonetheless, the leading order behaviour is captured by the island formula.
Consider now a pure bulk state with an arbitrary entanglement spectrum 4 Unlike in the original PSSY with the bulk state (1.4), the EOW brane loops are now weighted by c i and an n-connected loop evaluates to Tr(kρ R ) n . We keep using k to denote the Schmidt rank of the bipartite state. It's shown explicitly in [17] that a bulk state non-trivial entanglement spectrum, specifically a state in superposition of two branches, 5 does lead to a leading order correction to the island formula.
This feature was first pointed out by Akers-Penington (AP) in the QES prescription in AdS/CFT [16]. Heuristically, the non-uniform correlation in the matter sector enhances the effect of replica symmetry breaking. The non-flatness of the spectrum is gauged by the difference between the min-entropy and the max-entropy of the bulk state, which are the entropic measures invented to characterize information-processing tasks in the one-shot regime [19].
We expect the failure of the island formula at the leading order if their difference is comparable to the area term (BH entropy). This deviation occurs in a large regime "near" the transition governed by the gap between the min-entropy and the max-entropy. In fact, the Page time, usually defined as the time when the bulk entanglement entropy is equal to the BH entropy, is not longer a relevant scale 4 For a generic mixed state, which can always be considered as a marginal of a purification, we have a tripartite correlated system among the black hole, radiation and the reference. The reason we restrict to pure bulk states is that we mostly care about the quantum correlation between the radiation and the black hole in the context of the information puzzle. Also, the Page curve concerns the entanglement entropy, and this is only an operationally meaningful measure of the entanglement between the radiation and the black hole. 5 This state has the Schmidt coefficients in (1.6) chosen as {λ i } m i=1 = x, {λ i } k i=m+1 = y and x y, k m. Figure 2: Saddles for three replicas. The figure shows all the relevant five saddles for three replicas. The middle three in the bracket are the non-replica-symmetric ones which are ignored in the island formula derivation but we include in our calculation. The partition function evaluates tõ to characterize the transition. It is superseded by the one-shot versions indicating when the island formula seizes to apply. We shall now attack the problem for an arbitrary bulk entanglement spectrum in (1.6), and we need a replacement for the island formula. Implementing the replica trick gives the partition functions {Z n } n∈N which admit the following form, where the contributions are organized by the non-crossing partitions. Each saddle corresponds to a configuration of wormholes that corresponds to a NC partition π (cf. Fig. 3), and the total contribution from the gravitational sector is simply the multiplication of the contributions {Z |V | } V ∈π for each |V |connected wormhole. See Fig. 2 for the example ofZ 3 .
To sort out contributions from the matter sector, we observe that the EOW brane loops define another set of cycles/partitions corresponding to a unique NC partitionπ ∈ NC n dual to each π ∈ NC n . This is called the Kreweras complement of π, defined as follows. The rule is to replicate another set of [n] (say in red) and interlace them with the original set [n] (say in black); and then link up as many red dots as possible without crossing the black dots. It is evident that this procedure defines a unique NC partitionπ. (See an illustration in Fig. 3b and a precise definition is provided in Section 2.3.).
We therefore have the matter contributions also organized, With the notion of Kreweras complement defined, this formula can be readily read off from the geometric diagrams of the saddles. The main hypothesis of the replica trick is that theZ n 's (with appropriate normalization) are the moments of a density operator of the radiation in the fundamental description, i.e. Trρ ñ R . The replica trick GPI acts like an oracle, that takes as inputs the moments of the matter density operator and the moments of the black hole thermal spectrum, and then outputs a collection of moments. Mathematically, the replica trick should be understood as a moment problem. Generally, there is no guarantee that this moment problem is well-posed, in the sense that the solution exists and it is unique. Here we have a rather explicit formula for its moments, we can in principle address this issue  Figure 3: The combinatorics of the saddles. The saddles are fully characterized by the set of NC partitions. Specifically, any wormhole configuration corresponds to a NC partition and the configuration of the EOW brane loops corresponds to the Kreweras complement, which is precisely defined via a partition diagram that underlies the saddle configuration. Here we illustrate a generic example that the wormholes are depicted by the partition (in black) π = (1)(2 3 12 13)(4 6 9)(5)(7 8)(10)(11) ∈ NC 13 , and the EOW brane loops are depicted by its Kreweras complement (in red)π = (1 13)(2)(3 9 10 11)(4 5)(6 8)(7)(12) ∈ NC 13 . and hopefully work out the spectrum of the density operator. The only trouble is that it isn't obvious what the formula is actually doing, and the key is to examine it through the lens of free probability.

Solution: The free probabilistic toolkit
The key observation is that the combinatorial structure of how the moments are processed in (1.8) precisely matches with how the free multiplicative convolution between two probability distributions works (cf. (2.39)). To see this, we'd need a brief introduction of free probability theory. We shall follow Speicher's combinatorial treatment [20][21][22], 6 in which the theory is built upon the key object of free cumulants. We will give an introduction of non-commutative random variables, freeness and the free probabilistic toolkit in Section 2.
In order to match the GPI precisely with the free multiplicative convolution, we have to deal with the issue that the black hole thermal spectral distribution, denoted as ν b (cf. (3.3)), is not a probability distribution, because the black hole density of states in the large N limit is non-integrable and so is ν b . Instead, the moments of ν b should instead be viewed as the free cumulants of some other spectral distribution, known as the free compound Poisson distribution µ ν b . Note that µ ν b is completely determined by ν b , which can be extracted from the GPI in the JT gravity+EOW brane theory and is parameterized by the temperature and the brane tension. Finally, we obtain the spectral distribution of the radiation as a free multiplicative convolution, which we sometimes simply refer as the free convolution, Mathematically, this result says that the replica trick GPI implements the free multiplicative convolution for the two input (bulk) distributions effectively describing the black hole ν b and the radiation ν r . The convolution formula computes the radiation entropy accurately even in cases when the island formula fails to apply. The result in turn justifies that our moment problem is indeed well-posed. We shall argue the existence and uniqueness of the resulting distribution, satisfying the premise of the replica trick. This is our main result and the details will be provided in Section 3.1. Given (1.9), the radiation entropy simply equals to the (continuous) entropy of νr (cf.(3.23)). Such a free convolution formula for the radiation entropy, unlike the island formula (1.1) or the QES formula, does not build upon the generalized entropy of some extremal surface γ, S gen (γ) = A[γ]/4G N + S bulk , such that its contribution in the partition function dominates over the rest. We learnt from our solution that generally there isn't a clean separation among the contributions so minimizing the generalized entropy over the candidate QES does not work. Instead, the partition functions associated with two competing QES, are freely convoluted by the GPI, so (1.9) supersedes the island formula in the PSSY model. An example of such a Page curve 7 is plotted in Fig. 4 to demonstrate the large corrections. The departure from the island formula agrees well with the one-shot entropy analysis. (cf. Section 3.5.1).
It seems that all we have claimed so far merely amounts to giving the replica trick (1.8) a fancy name and a fancy symbol, but really the advantage of having (1.9) is that we can use the tools from free harmonic analysis to evaluate νr. This is totally analogous to how we can use Fourier transforms to evaluate convolutions between probability distributions for classical (scalar) random variables.
Generally, a free multiplicative convolution is difficult to evaluate even numerically. To do so, one needs to invoke a heavy set of machinery from the operator-valued free probability theory together with a couple of tricks. Fortunately, this particular free convolution at hand (1.9) is analytically tractable using the standard tools from free harmonic analysis. In Section 3.2 we show in detail how it can be done. Our result is summarized as an algorithm below.
For any z in the upper complex plane C + := {z ∈ C|Im(z) > 0}, solve the following equation for y(z) with the two input distributions ν r , ν b (highlighted in red) supported on the positive real numbers This is a fixed-point equation for y(z) that is numerically easy to solve. Then plug the solution y(z) into the following equation which gives the Cauchy transform of νr, denoted as G νr , that is analytic on C + . The spectral distribution can be extracted using the Stieltjes inversion, The resulting distribution does not admit a closed-form expression for a generic νr, and this algorithm gives the most explicit description that one can hope for. In the context of random matrix theory, our solution also applies to resolving the spectrum of a class of random matrix ensemble, known as the separable sample covariance matrix. The same algorithm was obtained using the moment method in the random matrix literature [24][25][26][27]. Here we offer a novel free probabilistic treatment that could potentially have independent interest.
The free convolution formula suggests that the quantum information encoded in the gravitational sector and the matter sector, or more generally the information encoded in two candidate QES, 8 should be modelled as free random variables in a non-commutative probability space, which we choose to be a finite von Neumann algebra. Voiculescu taught us that free random variables can be further modelled by large independent random matrices [28], so we can deduce a random matrix/tensor model from the algebraic random variables for the PSSY model. We show that this random matrix model is nothing but a generalization of Page's model, which is perhaps the very first random tensor network model used to study the information-theoretic aspects of gravity. This is discussed in Section 3.3. We provide more details on the interplay between random matrices and free probability in Appendix B.
It is legitimate to worry that the free convolution formula we had could be something very special to the PSSY model, and it is unclear if it can be generalized to realistic black holes. To address this concern, we should make an attempt to put the entropy formula in such a way that it doesn't explicitly entail the free convolution. We show that the convolution formula can be formulated as a generalized entropy, resembling the formulation due to Wall in his proof of the generalized second law [29]. It equates, up to a state-independent constant, the radiation entropy to the relative entropy between the boundary state and the Hartle-Hawking state. In this sense, we difference in the radiation entropy precisely measures Hawking's surprisal in learning the actual state of the radiation while holding the hypothesis that it's a thermal state. This is discussed in Section 3.4.
We also make contact with some relevant results in the literature, and perform some consistency checks on our proposal. In particular, we reproduce the original results of PSSY, and we show that the free convolution formula reduces to the island formula in regimes governed by the one-shot entropies, as expected for the general QES prescription [16,17]. This is discussed in Section 3.5.
Lastly in Appendix A, we discuss an observation that an old black hole encodes a channel that violates the additivity of minimum channel output entropy. The latter was a fundamental conjecture in quantum Shannon theory [30] that is equivalent to the additivity of classical capacity of quantum channels, and it was proven false using a random channel construction by Hastings [31]. Here we see that an old black hole naturally encodes such a random channel, matching a known counterexample to the conjecture [32]. Even though for an old black hole the subleading non-replica-symmetric saddles can safely be ignored as far as the entropies are concerned, the additivity violation is only visible when all the saddles are counted. Surprisingly, summing over all saddles can be important in regimes where we usually we don't expect it to be.

Notations
We use the lower case letters such as r, b, c, u, · · · to denote the non-commutative random variables, and they belong to a non-commutative probability spaces (W, τ ) which is a finite von Neumann algebra represented as bounded operators on an infinite-dimensional Hilbert space. It is equipped with a trace τ that gives the expectation values for the random variables. Their spectral distributions are denoted as ν r , ν b , etc. The symbol µ is reserved for any specifically defined distributions. We use the upper case letters to denote the finite dimensional (random) matrices as R, B, C, U, · · · . We denote the normalized trace as tr := 1 N Tr. For density operators under this notation convention, we shall not assign additional symbols to the density matrices, but instead directly use the system symbol to denote the its density matrix (such as R := ρ R , etc). To avoid confusions, we use the Serif fonts, such as B, R, to denote the systems. The entropies of density operators are then simply denoted as S(R), S α (R), etc. We shall use the tilde symbolsR andB to denote density matrices of the radiation and the black hole in the fundamental description (at the boundary), and we use symbols R and B in the effective description (in the bulk).

Related works
The large correction to the QES formula was first pointed out by AP [16]. An example of a bulk state in an incoherent superposition was given in support of the argument, but the general behaviour is left as indefinite. Later, the large correction is also shown for a similar non-flat bulk state in a coherent superposition of two branches in the PSSY model. Both calculations use the method of Schwinger-Dyson equation with the resolvent, which was first used by PSSY to derive the Page curve in a canonical ensemble. However, this method does not seem to be applicable when we deal with general non-flat bulk states. This work should be viewed as a generalization of the resolvent method by leveraging the power of free probability. The basic idea that underlies this work already appears in the dissertation of the author [33]. In this paper, a further developed study is presented. Free probability has also been used in computing various distinguishability measures for random states that resemble the black hole microstates [34].
A similar result is obtained by Cheng et al for random tensor networks (RTN). They consider RTN with bonds of non-flat entanglement spectrum modelling area fluctuations [35]. In a situation where there are two minimal cuts homologous to a boundary subregion, the spectral distribution of the boundary reduced state asymptotically equals to the free multiplicative convolution of the limiting spectral distributions of the bonds at the two cuts, which are further convoluted with a Marchenko-Pastur distribution. We will elaborate on how their result compares to ours in Section 3.5.3. In this work, we go one step further by showing how this particular convolution can actually be evaluated in Section 3.2.

A free probability primer
Free probability theory is a probability theory of non-commutative random variables equipped with a special notion of independence called freeness. Before going into freeness, we first need to introduce some basics about non-commutative probability theory. It is usually formulated abstractly in terms of the algebra of random variables and their expectations. The abstraction allows the framework to accommodate classical probability theory, quantum theory and free probability theory. Here we give a minimal introduction of the relevant notions to set some backgrounds. We shall approach the subject following the treatments in [21,22,36]. Experts should feel free to skip over this section.

Non-commutative probability spaces
The most general non-commutative probability space is defined on a unital (i.e. it contains the multiplicative identity 1) * -algebra A, which is algebra A over C endowed with an involution * such that for all a, b ∈ A and z ∈ C, and elements satisfying a = a * are called self-adjoint.
Examples of * -algebra that are particularly relevant to us are complex N × N matrices with the involution being the conjugate transpose; and bounded complex-valued random variables over some probability space (Ω, F, P ), denoted as L ∞ (Ω, F, P ), with the involution being the complex conjugate.
For a * -algebra to be a probability space, we need to add an expectation that is mathematically a unital * -linear (i.e. ϕ(a * ) = ϕ(a)) function ϕ : A → C. Definition 1. A non-commutative probability space 9 (A, ϕ) is defined as a unital * -algebra A equipped with a unital * -linear functional ϕ that is positive and faithful. 10 We call such a linear functional ϕ a state on A. In probabilistic terms, A is an algebra of noncommutative random variables, and the state gives their expectation values. Technically, what we defined is called a * -probability space.
A familiar example of a non-commutative probability space is quantum theory (A, ϕ), where A ⊂ B(H) is the operator algebra of observables as a subalegebra of bounded operators on a Hilbert space H, and a vector in Hilbert space |ϕ ∈ H defines the state ϕ(a) := ϕ|a|ϕ , a ∈ A. The state gives the expectation value for the observable a. In particular, we can consider a positive-operatorvalueds-measure (POVM) set {m i } i∈I and then ψ(m i ) gives the Born rule probability of observing the outcome i.
This definition of probability space is flexible enough to allows us to talk about both commutative and non-commutative random variables under the same footing. For example, we can take L ∞ (Ω, F, P ) 9 The term "non-commutative" should throughout be interpreted as "potentially" non-commutative, because it can also model commutative random variables. 10 A glossary of jargons: unital: ϕ(1) = 1, positive: ϕ(aa * ) ≥ 0 and faithful: ϕ(aa * ) = 0 ⇐⇒ a = 0.
defined above, which we henceforth abbreviate L ∞ (Ω), and we take the linear function to be the expectation E[·] induced by the measure P . Another example is the algebra of N × N (deterministic) complex matrices, (M N (C), tr), where M N (C) is the algebra of complex matrices of size N and tr is the normalized trace, tr(X) := 1 N TrX for any X ∈ M N (C). The more interesting example is to put these together to accommodate random matrices, It is the algebra of N × N random matrices with individual matrix entry being a random variable on (Ω, F, P ), and it is equipped with the expectation, In the above example for random matrices, the state ϕ = E ⊗ tr has the important property of being tracial. It means that ϕ(ab) = ϕ(ba). Since our application of free probability theory concerns only the tracial case, we shall now restrict to such states. It is often desirable to have a unique tracial state as providing the notion of expectation. Furthermore, later we would like to be able to discuss functions of random variables, so we better work with an algebra that is completed to pave the way for functional calculus. To this end, we can complete the * -algebra to a von Neumann algebra, which is also referred as a W * -algebra. 11 A von Neumann algebra W is defined as a unital * -algebra of bounded operators on a Hilbert space B(H) that is closed in the weak operator topology. 12 We denote the commutant of W in B(H) as W := {b ∈ B(H)|ab = ba, ∀a ∈ W}. The bicommutant theorem says that a * -algebra W is a von Neumann algebra if and only if W = (W ) . The center of W is defined as Z(W) := W ∩ W and we say W is a factor if the center is trivial, i.e. Z(W) = C1. Such a unital tracial normal state is unique when W is a finite factor, which means it is either Type I n or Type II 1 . 15 A Type I n factor is isomorphic to the familiar matrix algebra M n (C). A Type II 1 factor only has representations as a subalgebra of bounded operators on an infinite dimensional Hilbert space H. Moreover, there is no irreducible representations, that is to say that every vector in H is entangled between the W and its commutant W ∈ B(H). Nonetheless, trace does exist for every element in a Type II 1 factor, so we can still make sense of density operators and entropies, which we shall introduce later. A Type II 1 algebra is also the setting where free probability was first invented by Voiculescu to study the free group factors isomorphism problem [38,39]. 11 One can also build a C * -probability space on a C * -algebra, see Chapter 3 in Nica-Speicher [20]. 12 The weak operator topology is the operator topology induced by the seminorms: a → (v, aw) for any v, ω ∈ H. Namely, a sequence/net of bounded operators a i ∈ W converges to a ∈ W in weak operator topology if (v, a i w) converges to (v, aw) for any v, ω ∈ H. 13 More generally, we can allow non-tracial state τ for a W * -probability space. Since we shall only consider tracial situations, we often omit "tracial" in describing a W * -probability space.
14 A state τ on W is normal if we have τ (sup λ a λ ) = sup λ τ (a λ ) for any increasing nets (a λ ) λ∈Λ of self-adjoint operators a λ ∈ W. 15 (Hyperfinite) factors can be classified based on the properties of their projections. Here we give a brief summary. A Type I factor has a minimal projection, and is isomorphic to the algebra bounded operators on some Hilbert space B(H). A Type II factor has no minimal projection but has a finite projection, and it is further a Type II 1 factor if all the projections are finite. Lastly, a Type III factor has no finite projections. cf. e.g. Jones [37] for detailed explanations.
Unless otherwise specified, we will by default work with a W * -probability space (built on a finite von Neumann algebra) as the stage for free probability theory. For example, note that the random matrix algebra L ∞ (Ω) ⊗ M N (C) is a direct integral of Type I N factors with a center L ∞ (Ω), equipped with a trace E ⊗ tr, so it is indeed a W * -probability space.
A non-commutative probability space (W, τ ) can be abstractly defined without the need to concretely set up an underlying sample space and event space as we did for the random matrices. This allows to make universal statements at the abstract level, a desirable property especially convenient for studying random matrices that often manifests universal features asymptotically. Freeness, as we shall introduce later, is a good example of such an abstraction describing the universal behaviour of independently distributed random matrices.

Non-commutative distributions, density operators and entropies
For a classical random variable with compact support, knowing all its moments is equivalent to knowing its distribution. For instance, given the moments E(X n ) of a real-valued random variable X, we can find a probability distribution µ X such that Similarly, for a collection of scalar random variables X 1 , . . . , X k , we have a joint probability distribution µ X1,...,X k that produces all the joint moments, Note that we can make sense of the probability distribution defined above in two different ways. Given an underlying Kolmogorov triple (Ω, F, P ), µ X1,...,X k is nothing but the push forward of P under the measurable map ω ∈ Ω → (X 1 (ω), . . . , X k (ω)) ∈ R k . We call this an analytic distribution and this is what the distributions mean in the integral formulas above.
More abstractly, if we choose to work without referring to an underlying sample space but only with the expectations of our random variables, then the notion of probability distribution should generally be understood algebraically as the following linear map where C[x 1 , . . . , x k ] denotes the ring of commutative polynomials in k indeterminates x 1 , . . . , x k . We shall call this the algebraic distribution and this is the appropriate notion that can be generalized to the non-commutative setting, in which the Kolmogorov triple is usually abstracted away.
Generally, we do not have the analytic counterpart to a non-commutative distribution. Nonetheless, this is possible for the case of a single bounded 16 random variable. 17 We can find an analytic distribution from its moments as in the classical case. Mathematically this is known as a Hausdorff moment problem, which asks for a unknown compactly supported distribution that has the given moments. The solution is unique if it is solvable. 18 Consider a bounded single self-adjoint random variable a ∈ (W, τ ). Given its moments τ (a n ), one can find a unique measure ν a with compact support on R such that τ (p(a)) = p(x)ν a (x)dx (2.8) for any polynomial p ∈ C x . The support is explicitly given by the spectral radius of a as [−ρ(a), ρ(a)] where ρ(a) is the spectral radius of a (cf. Footnote 16). In fact, this claim holds generally for a * -probability space. 19 However, the advantage of completing a * -algebra to a von Neumann algebra is that we can use these polynomials as building blocks to extend (2.8) to any continuous functions on the support of ν a . We can define The operator f (a) on the LHS is defined via a unital * -algebra homomorphism Φ a mapping from f ∈ L ∞ (R, ν a ) to f (a) ∈ W. Φ a is known as the functional calculus for a. Since polynomials are dense in the continuous functions over supported over [−ρ(a), ρ(a)], (2.9) uniquely determines Φ a . (cf. Theorem 3.1 in Nica-Speicher [20].) We shall refer to this analytic distribution ν a as the spectral distribution of a. Such an extension to continuous functions is necessary for us to study spectral functions like the von Neumann entropy.
We now discuss how to define entropies for states acting on a W * -probability space. This was first studied by Segal [40]. (See also a recent article on this subject [41].) In a W * -probability space, the density operator a ∈ (W, τ ) 20 of a normal faithful state ϕ on W is defined via the expectation τ via ϕ(x) = τ (xa) and has normalization τ (a) = 1. Such assignment of the density operators is unambiguous as the expectation τ is unique.
The von Neumann entropy of the state ϕ is then defined as usual where we've used (2.9) for f (x) = −x log x. 16 The spectral radius ρ(a) of a self-adjoint random variable in (W, τ ) is defined as ρ(a) := limn→∞ |τ (|a| 2n )| 1/2n , and the Cauchy-Schwarz inequality implies that the moments are (exponentially) bounded by ρ(a), τ (a n ) ≤ ρ(a) n . We say a self-adjoint random variable a is bounded if its spectral radius is finite. 17 Self-adjointness is not necessary here for the analytical distribution to exist. We choose to focus on self-adjoint variables here because these are the most relevant for us. Generally, for any normal operator (i.e. it commutes with its involution) in a *-probability space, an analytic distribution can also be constructed from the * -moments, τ (a k a * l ), ∀k, l ∈ N. See e.g. Chapter 1 in [20]. 18 For a collection of moments {mn} ∞ n=0 to define a unique probability distribution, it needs to satisfy Carleman's condition, ∞ n=1 m −1/2n 2n = ∞. This condition is always satisfied for a compactly supported probability distribution. 19 See Theorem 2.5.8 in [36] for the more general statement. 20 When the density operator a is unbounded, it does not belong to the algebra W itself but is affiliated to the algebra, which means that bounded functions of a belong to W. For simplicity, we assume that our density operators have bounded spectra.
Similarly, the α-Rényi entropies are defined as The von Neumann entropy so defined take values in [−∞, 0], where the maximal is obtained at the tracial state τ itself, S(τ ) = S(1) = 0. It differs from the entropy of density matrices in the usual quantum information convention. Consider now the usual convention for a finite dimensional matrix algebra (M N (C), Tr), and the density matrix A ∈ (M N (C), Tr) of a tracial state ϕ is normalized by TrA = 1, against the density matrix A N ∈ (M N (C), tr) of the same state ϕ normalized by trA N = 1.

It follows that
We are able to compute the entropy S(a) directly from the freely convoluted spectral distribution. In order to obtain the standard entropy value S(A) for some N × N density matrix A, we simply add back log N . This is a good approximation if N is large enough such that S(A N ) is close to its asymptotic value S(a).
Finally, we'd like to briefly sketch how to define the relative entropy between two faithful normal states ϕ and ψ on a von Neumann algebra W. Readers can refer to the review article [42] for details.
Consider two faithful normal states ϕ and ψ on a von Neumann algebra W that is not necessarily finite, they can be represented as cyclic and separating vectors |ϕ , |ψ ∈ H in some Hilbert space H, 21 such that for any x ∈ W, Now we need to introduce the Tomita operator S ϕ|ψ , defined via S ϕ|ψ a|ϕ = a * |ψ , ∀a ∈ W (2.14) The Tomita operator admits the following polar decomposition, where J ϕ|ψ is an anti-unitary operator that sends W to W , known as the relative modular conjugation, and ∆ ϕ|ψ = S * ϕ|ψ S ϕ|ψ is the relative modular operator. Now we can define the relative entropy For a finite von Neumann algebra (W, τ ), the relative entropy can also be expressed in terms of the corresponding density operators using the more familiar Umegaki's formula, where a and b are the density operators of ϕ and ψ respectively.

Freeness
We have discussed how to make sense of the spectral distribution of a single bounded self-adjoint random variable. Can one go beyond the case of a single random variable? This calls for an appropriate notion of independence among non-commutative random variables. As we shall see, the very notion of freeness is an appropriate one that is alternative to the more familiar tensor independence, and free probability theory is the study of free non-commutative random variables.
for any polynomials p 1 , . . . p k and indices i 1 , . . . i k ∈ [d] with no adjacent ones equal.
Let's unpack this definition of freeness. In words, it says "the alternating product of centered random variables is centered". As advocated by Speicher, we can think of freeness as a convenient rule to allow the joint moments to be expressed in the individual moments of the individual variables. For example, the definition (2.18) implies that for free a and b, This is not immediately obvious that (2.18) implies one can break down the joint moment of any word into the individual moments of its free letters, but we shall later show that this is indeed true. Since the moments define the non-commutative distributions, freeness should be understood as a notion of independence in the sense that the joint distribution of several free random variables is completely determined by the distribution of individual ones. Note however that freeness is distinct from the notion of (tensor/Cartesian) independence, that concerns scalar random variables 22 or commutative objects more generally. Even for a non-commutative probability theory, such as quantum theory (A, ϕ), independence concerns commuting self-adjoint operators representing observables that are simultaneously measurable. We say two commuting observables a and b are independent in (A, ϕ) if ϕ(a m1 b n1 a m2 b n2 · · · ) = ϕ(a m1+m2+··· )ϕ(b n1+n2+··· ). 23 This doesn't happen say when ϕ is a Bell state shared by Alice and Bob and a, b are spacelike separated local observables of Alice and Bob. Their observations (measurement outcomes) are therefore correlated.
On the other hand, freeness (2.18) gives a distinct rule for how the joint moments should factorize for non-commutative random variables. We can use the following example to illustrate their difference. Consider two independent commuting random variables in a W * -probability space (W, τ ) with zero means but positive second moments. Then τ (abab) = τ (a 2 )τ (b 2 ) > 0, so they cannot be free because In fact, one can show that for a pair of commutative random variables to be free, one of them has to be a scalar, and a scalar is free from anything. 24 Hence, freeness and independence are almost orthogonal notions and neither is stronger than the other. 22 Scalars are operators that are proportional to the identity. 23 Rather than talking about independence between two observables for a given state, one can also discuss notions of independence between two observable subalgebras associated with different subsystems for any states. They are commonly formulated as the ability to independently prepare states for each subsystem. There are various ramifications of such notions and commutativity between the subalgebras are not generally demanded. See [45] and references therein. 24 See for example Proposition 1.10 and 1.11 in [22].
Despite their sheer difference, freeness is a sensible counterpart of independence that is tailormade for non-commuting random variables. This is because the key theorems and properties about independent random variables naturally have their free analogs for free random variables. Examples are various limit theorems, such as the free central limit theorem [46] and the law of large numbers [47,48], the free convolutions of distributions [46,49] and classification of infinitely divisible and stable laws [50,51]. An example that is particularly relevant for us is the free Poisson limit theorem. It is the non-commutative analog of the Poisson limit theorem that asserts the sum of Bernoulli random variables (coin flips) asymptotes to a Poisson random variable. We will discuss it later in Section 2.5.
Conversely, given any probability measure ν with compact support on R, it's always possible to construct a W * -probability space (W, τ ), such that there exists some self-adjoint a ∈ W and ν a = ν. (cf. e.g. Facts 4.2 in [22]).
Furthermore, with a collection of non-commutative probability spaces (W i , τ i ) i∈[n] , one can always amalgamate them into a larger probability space (W, τ ), such that W i ⊂ W, τ (a i ) = τ i (a i ), ∀a i ∈ W i and any collection of random variables draw individually from these subalgebras, {a i }, a i ∈ W i , are free. (cf. e.g. Lemma 2.5.20 in Tao [36]). This is achieved by the free product, 25 We shall spare any details on free product here and interested readers shall consult chapters 5-7 in Nica-Speicher [20]. It allows us to introduce free random variables of specified distributions in a joint probability space, just like we are used to do in classical probability theory. The only difference is that we now replace the Cartesian/tensor product by free products.
Free probability was born in the field of operator algebra [38]. Nonetheless, it was later realized by Voiculescu that it has an intimate connection with random matrix theory. In Voiculescu's seminal work in 1991 [28], he proved the first result connecting between the asymptotic random matrices and free probability theory. (cf. Theorem 2 for the precise statement in Appendix B) Roughly speaking, Voiculescu's theorem implies that random matrices with classically independent matrix elements tend to be free in the asymptotic limit. In fact, the asymptotic freeness would not hold without the non-commutativity as we argue above. Heuristically, we can say that any finite-dimensional random matrices are still too commutative to be free. When studying an abstract non-commutative random variable, it is often useful to use a sequence of random matrices as a model for concreteness. We will later also make contact with random matrices for a better understanding of the free probabilistic calculations. We postpone more details regarding this connection to Appendix B.
At the level of moments, it is somewhat cumbersome to describe freeness using (2.18). We now introduce an alternative characterization in terms of the free cumulants. We first define the notion of non-crossing (NC) partitions. 26 We denote the set of NC partitions of [n] as NC n . We use |π| to denote the cardinality of π ∈ NC n . Given a finite set [n] of size n, consider now doubling it with additional elements1, . . . ,n and interlace 25 Here we stick to the standard terminology. Note that the term free product is used instead for the free multiplicative convolution in Cheng et al [35], which should not be confused with the algebraic operation we are referring to here. 26 The non-crossing partitions are in one-to-one correspondence with the non-crossing permutations: For each NC partition π = {V i } i , we let the NC permutation be defined with cycles (V i ) i such that each cycle (V i ) consists of the elements in the block V i arranged in an increasing order. them with 1, . . . , n in an alternating way, 1,1, . . . , n,n. For any π ∈ NC n , its Kreweras complement [52],π ∈ N C(1, . . . ,n) ∼ = NC n , is defined as the biggest 27 element in N C(1, . . . ,n) such that π ∪π ∈ N C(1,1, . . . , n,n). 28 See Fig. 3b for an illustration. Note that generally for any two partitions π 1 , π 2 ∈ P n , we have |π 1 | + |π 2 | ≤ |π 1 π 2 | + n. We have the equality saturated |π| + |π| = n + 1 for and only for a NC partition and its Kreweras complement. 29 We are now ready to define free cumulants.
Definition 5. For a unital linear function τ : W → C, the free cumulants are multilinear maps κ n : W n → C defined recursively via τ (a 1 · · · a n ) = where V is a block of a NC partition π and |V | denotes its cardinality.
We shall often drop free and just call them cumulants. We refer (2.20) as the moment-cumulant formula. There is only one term in the sum involving the highest cumulant κ n , thus the formula can inductively be inverted for the cumulants in terms of the moments. 30 Therefore, (2.20) is indeed the defining formula for free cumulants.
We are often interested in the relation between the moments and cumulants of a single random variable a, we can define the abbreviations for the n-th moment and n-th cumulant of a as m n (a) := τ (a n ) κ n (a) := κ n (a, . . . , a) . (2.21) We then have a special case of the free moment-cumulant formula (2.20), We shall remark that for a single self-adjoint operator a, the moments m n (a) are identical to the classical moments of a random variable X with the probability distribution given by the spectral distribution of a. Because of the distinction between freeness and independence, however, the corresponding classical cumulants, denoted as u n (X), are defined differently from κ n (a). The same formula (2.22) is used up to changing the NC partitions to all partitions of [n], denoted as P n , yielding the classical moment-cumulant formula, (2.23) Hence, one can see that κ n and u n with n ≥ 4 are different from each other. The classical cumulants u n (X) are also known as the Ursell functions in statistic physics or the connected correlation functions in QFT. In a Feynman diagram expansion of a partition function in QFT, the n th cumulant 27 There is a natural partial order defined for NC partitions, called the reverse refinement order. We say π 1 ≤ π 2 if each block of π 1 is contained in the blocks of π 2 .
28 The corresponding permutation ofπ is equivalently given by π −1 η where η is the cyclic permutation [53]. 29 One can define a metric on the permutations known as the Cayley distance, d(π 1 , π 2 ) := n − |π −1 1 π 2 |, which also counts the minimal number of transpositions needed to transform π 1 to π 2 . Then equation |π 1 | + |π 2 | ≤ |π 1 π 2 | + n follows from the triangle inequality of the Cayley distance. 30 More precisely, the set of non-crossing partitions forms a lattice with respect to refinement order and the cumulants are given by the Mobius inversion with respect to this order. u n (φ) defined for a field φ gives the n-connected Green functions that connect exactly n sources. A cumulant u n (φ) should thus be understood as capturing genuine n-partite interactions, as opposed to independence or correlations that can be decomposed into interactions among fewer parties. Analogously, we will see that the free cumulants are precisely computed by the replica wormholes in the planar gravitational saddles of the replica trick GPI.
Here comes the punchline: Freeness is equivalent to vanishing mixed cumulants.
Theorem 1 (Speicher [54]). The fact that a and b in (W, τ ) are free is equivalent to vanishing It's now clear that any joint moments of free random variables can be written in terms of the individual cumulants, because the mixed cumulants vanish. Since each individual cumulants is a polynomial of the individual moments via inverting the moment-cumulant formula, we have established the desired property that the joint moments are polynomials of the individual moments for free random variables. This drastic simplification is what freeness buys us.
As discussed previously, from the individual moments of a collection of free self-adjoint (or normal cf. Footnote 17) random variables, we can extract the spectral distribution of each random variable. Their freeness further allows us to extract the analytic distributions for some combinations of free random variables via tools that we will now come to.

Free harmonic analysis
We now introduce some tools from free harmonic analysis that will prove useful.

The Cauchy transform
We start with the Cauchy-Stieltjes transform, 31 or simply the Cauchy transform, of a distribution µ on R, where I denotes the support of µ on the real line. The Cauchy transform is defined on both the upper and lower complex half-plane, denoted as C + := {z : Im(z) > 0} and C − := {z : Im(z) < 0} respectively. However, note that G µ (z) * = G µ (z * ) so we only need to focus on the domain C + . In particular, it is a holomorphic function from C + to C − . We can extract the spectral distribution µ(x) via the inverse transform, known as the Stieltjes inversion, Consider now a self-adjoint operator a ∈ (W, τ ). The Cauchy transform of its spectral distribution can be regarded as a moment generating function for a, (2.26) In the random matrix theory, the Cauchy transform for the spectral distribution of a random matrix is also known as the (trace of) resolvent of A. 31 The Stieltjes transform usually refers to the Cauchy transform with a minus sign.

Free additive convolution and R-transform.
Freeness allows us to break down the moments of any polynomial of several free random variables in terms of the cumulants/moments of each individual random variable. It's natural to ask about the spectral distribution ν a+b of the sum of two free self-adjoint random variables a and b. This distribution is known as the free additive convolution of ν a and ν b [46], where the notation is chosen to indicate that this spectral distribution can be obtained from their spectral distributions ν a and ν b thanks to the freeness. The free additive convolution is associative and commutative operation. Note this operation is defined here for any two compactly supported distributions on R, 32 Therefore, it only depends on the spectral distributions ν a and ν b rather than the specific variables a and b, because different random variables can share the same spectrum.
In order to extract the convoluted distribution, we look at its cumulants, κ n (a + b). Recall that free cumulants are multilinear maps and the mixed free cumulants for free random variables vanish, from which it immediately follows that, for any n, (2.28) Therefore, we obtain a simple relation between a + b and the individual a and b in terms of the free cumulants. Now we need to make an analytic connection back to the moments via the R-transform.
For a deterministic spectral distribution ν a , its R-transform is defined as The R-transform can be viewed as a free cumulant generating function. Note that unlike the Cauchy transform that is analytic on the entire C + , the R-transform is usually only analytical for small z and the domain depends on the measure ν a . See Chapter 3 in Mingo-Speicher [21] for a detailed account of its analytic domain. Then (2.28) implies the convolution becomes additive under the R-transform, Recall that the Cauchy transform (2.24) is like a moment generating function. Speicher showed that it is related to R-transform via [54], Treating G νa as a variable z, we can write it as is the inverse function of the Cauchy transform. It can be further re-written as Hence, the Cauchy transform is the functional inverse of R νa (z) + 1/z. 33 The equation (2.31) connects the R-transform and the Cauchy transform and thus the spectral distribution. Let's summarize the algorithm for free additive convolution: 32 It can also be defined for distributions with non-compact support on R [55]. 33 This is sometimes used as an alternative definition for the R-transform. Algorithm 1. Free additive convolution ν a ν b via the R-transform: 1. compute the Cauchy transform from the spectrum measures ν a and ν b using (2.24); 2. compute the R-transforms using (2.31) and apply (2.30) to obtain R νa ν b ; 3. find the Cauchy transform and then extract ν a ν b via the Stieltjes inversion (2.25).

Free multiplicative convolution and S-transform.
Similarly, one can also find the spectral distribution of the product of two free positive self-adjoint random variables a and b, ab or ba. In particular, let's consider the positive operators ab √ a that share the same spectrum. It is given by the free multiplicative convolution [49], where the notation is chosen to indicate that this spectral distribution can be obtained from their spectral distributions ν a and ν b thanks to the freeness. Just like free additive convolution, the free multiplicative convolution is an associative and commutative operation. Note this operation is defined for any two compactly supported distributions on R + . 34 Therefore, it only depends on the spectral distributions ν a and ν b rather than the specific variables a and b. The convoluted distribution has bounded support if ν a and ν b have bounded support.
Let's see how this works with the help of the S-transform. Analogous to the R-transform, which can help us calculate the free additive convolution, the free convolution can be computed by the Stransform, where the convolution turns into a multiplication. We first define another moment function ω just as an intermediate tool, The S-transform S µ (z) is then defined as where ω −1 µ (z) is the inverse function of (2.35). Generally, the S-transform is only analytic in the neighbourhood of 0. See for instance [55] for more details its analytic domain. We shall also note that the S-transform and R-transform are related via the composition inverse of the functions (2.37) Using this, one can transfer between the R-transform and the S-transform, and it shall prove very useful later. The S-transform allows us to compute the free multiplicative convolution with We then invert (2.36) to obtain ω νa ν b (z), thus G νa ν b (z) = (ω νa ν b (z) + 1)/z and finally the spectral distribution ν a ν b via the Stieltjes inversion (2.25). Let's summarize the algorithm for the free multiplicative convolution, Algorithm 2. Free multiplicative convolution ν a ν b via the S-transform: 1. compute the moment functions (2.35) ω νa and ω ν b from the Cauchy transforms (2.24); 2. compute the S-transforms using (2.36) and apply (2.38) to obtain S νa ν b ; 3. find the Cauchy transform and then extract ν a ν b via the Stieltjes inversion (2.25).
Above, we took Voiculescu's approach to introduce the free multiplicative convolution in terms of the S-transforms [49]. The following result due to Nica-Speicher provides a complementary combinatorial perspective on the free multiplicative convolution [20]. Let (W, τ ) be a non-commutative probability space and a, b ∈ (W, τ ) such that they are free. We have the moments of the product ab, and free cumulants Notice that these formulas look exactly like what the replica trick GPI gives us (1.8). This is the key observation of this work and we shall match the details later in Section 3.1.
In practice, the free multiplicative convolution and the free additive convolution via the S, Rtransforms can be hard to evaluate either analytically or even numerically. The difficulty lies in dealing with the functional inverses when converting the S, R-transforms back to the Cauchy transforms. 35 In fact, one can circumvent from dealing with the S, R-transforms directly and instead resort to the subordinate functions [59]. They have better analytic properties and boil down to fixed-point equations that are numerically tractable. (cf. Chapter 5 in [22] for an introduction).
There is a general procedure developed to evaluate any polynomials of a collection of free random variables. (cf. Chapter 10 in Mingo-Speicher [21] and [60] for details.) The procedure builds on the operator-valued free probability theory [61], that generalizes the scalar-valued case that we've been using. It then uses a linearization trick to formulate any polynomial as a linear combination of free variables with matrix coefficients, turning the problem into an addition of operator-valued free random variables [62,63]. Then, instead of the R-transforms, one can implement the subordination formalism to compute the sum [64]. Later it turns out that we do not have to implement this general machinery for our problem.

Free compression
Consider a projection p ∈ (W, τ ) of trace τ (p) < 1 acting as pap on a ∈ (W, τ ). It has the spectral distribution τ (p)δ 1 + (1 − τ (p))δ 0 . These projected random variables form a subalgebra (pWp, τ p ) ⊂ (W, τ ) where τ p := τ /τ (p) is the renormalized trace and p is the unit element of pWp. This projected subalgebra forms a non-commutative probability space and is referred as a compression of (W, τ ). The compression inherits the structural properties of (W, τ ).
Consider now that p is free from some a ∈ (W, τ ). We call pap a free compression of a by p. We are interested in applying the free multiplicative convolution to compute the spectral distribution, 35 There are some efforts from the machine learning community in developing algorithms to directly handle inverting the S-transforms [57,58].
where it's customary to normalize the compressed variable by τ (p). One can show that where the power denotes a 1/τ (p)-fold free additive convolution, provided 1/τ (p) is an integer. If we restrict to the subalgebra (pWp, τ p ) that the random variable is compressed to, we switch to the renormalized trace τ p and the corresponding spectral distribution, denoted as ν p pap/τ (p) , is simply (2.42) Surprisingly, we see that a free compression p essentially implements a 1/τ (p)-fold free additive convolution up to a rescaling or τ (p). In particular, we have the following relation for the cumulants defined w.
and thus the R-transforms satisfy It's also useful to write (2.42) as where D τ (p) denotes a dilation (rescaling) of the distribution. This operation is defined to rescale any Equation (2.45) follows from the general fact that if one rescales a self-adjoint operator by a, then its spectral distribution is dilated by 1/λ, In fact, we can use (2.42) to extend the definition of the free additive convolution power to any real number t ≥ 1 with τ (p) = 1/t. 36 Moreover, one can use the free compression to generate, from a given compactly supported ν, a -semigroup 37 of compactly supported probability measures [66]. It is denoted as, (ν t ) t≥1 , and it satisfies ν t+s = ν t ν s , ν 1 = ν . (2.48) Here is how the -semigroup can be constructed. Starting from a given ν, we assign a self-adjoint random variable a with this spectral distribution. We also introduce a projection of trace 1/t and put a and p in a common probability space (W, τ ) under the free product such that p and a are free. Then we consider pap ∈ (pWp, τ p ) and define ν t := ν p tpap = ν t . The semigroup property (2.48) is manifestly satisfied. We will later see an important family of distributions for which the semigroup extends also to 0 < t < 1.

Circular, semi-circular, quarter-circular and free Poisson distributions
In this subsection, we introduce some canonical examples of probability distributions in free probability theory that are relevant for our discussions later. These are closely related to some canonical random matrix ensembles and we shall mention some in passing. The details on random matrices are provided in Appendix B.
One of the most important and random matrix ensemble is the Gaussian Unitary Ensemble (GUE) of Hermitian matrices. An N × N GUE matrix S N have i.i.d. complex Gaussian matrix elements with the real and complex parts independently distributed with zero mean and variance 1/2N , and the diagonal are i.i.d. Gaussian-distributed with zero mean and variance 1/N . The spectral distributions of a sequence of GUE (S N ) N converges almost surely to the semi-circular distribution, with support x ∈ [−2, +2]. We call any element s ∈ (W, τ ) a semi-circular element if its spectral distribution µ s is given by the semi-circular distribution. It moments are given by the Catalan numbers, for even n, and zero otherwise . (2.50) Its free cumulants are given by κ n (s) = δ n,2 . (2.51) The semi-circular distribution is thus most easily characterized by that its only non-zero cumulant is κ 2 . This fact directly leads to the free central limit theorem. The classical theorem says that for a collection of n independent random variables x 1 , . . . , x n with zero mean and unit variance. Their average ( n converges (in distribution) to a standard normal random variable. The free probability counterpart is the free central limit theorem, which asserts that for a collection of free self-adjoint random variables {a i } n i=1 in (W, τ ) with zero mean and unit variance (κ 2 (a i ) = 1), their average ( n converges (in distribution) to a semicircular element s ∈ (W, τ ). 38 A circular element c ∈ (W, τ ) is defined to be for two free semi-circulars s 1 , s 2 . Its spectral distribution is the uniform distribution over the unit disk on the complex plane, called the circular distribution, They can be modelled by large random matrix ensemble C N of form 38 We can sketch here the proof for this theorem. The spectral distribution of the addition of a collection of free random variables is described by the free additive convolution, under which the free cumulants add. Therefore, all the free cumulants have linear growth in n whereas the denominator √ n suppresses n i κm(a i ) by n m/2 , so only the first (m = 1) and second cumulant (m = 2) survive in the limit n → ∞. Since the first cumulant equals to the first moment which vanishes for all a i , we have only a non-zero κ 2 = 1. Hence, the limiting distribution is semicircular and the free central limit theorem follows. for two N × N independent GUE matrices S (1) N and S (2) N . C N is called the (square) Ginibre ensemble and they have i.i.d. matrix elements that follow the complex Gaussian distribution of zero mean and variance 1/N . Its limiting spectral distribution is given by the circular distribution. The Ginibre ensemble has an important property that its joint distribution of the matrix elements are invariant under any unitary action from either left or right. We say it is both left and right unitarily invariant. 39 Now consider the modulus of a circular element, known as a quarter-circular element q := |c| := √ cc * . (2.55) Its spectral distribution follows the quarter-circular distribution, with support x ∈ [0, 2]. The spectral distribution is also shared by the modulus of a semi-circular element |s|. Its moments are given by m n (q) = 2 n Γ(1/2+n/2) √ πΓ(2+n/2) . It's more interesting to look at the moments of its square, which are given by the Catalan numbers, The quarter-circular distribution (2.56) is a special instance of the free Poisson distribution. Let us motivate it with the free Poisson limit theorem. It is the non-commutative analog to the Poisson limit theorem that asserts the sum of Bernoulli random variables (coin flips) asymptotes to a Poisson random variable. The free Poisson distribution can be obtained from multiple free additive convolutions of the Bernoulli distribution, which has two outcomes 0 and α. Then α is the jump size and λ is the rate of obtaining the outcome α. We have 40 which is known as the free Poisson distribution and it admits a close-form expression, 39 See, for instance, Lemma 1 in [67]. 40 The classical Poisson distribution is obtained by taking the classical * -convolution power. Consider a generalization of the free Poisson limit theorem such that the jump size is not a deterministic value α, but rather follows some probability distribution ν, This is known as the free compound Poisson distribution. They are most easily characterized by the cumulants. κ n (µ λ,ν ) = λm n (ν) .

(2.65)
A special case of a compound Poisson distribution is when λ = 1. Then the cumulants of µ λ,ν are exactly given by the moments of ν. In this case, suppose ν is the spectral distribution of some random variable a, then µ λ,ν is the spectral distribution of cac * . Namely, the conjugation by circular elements swaps moments to cumulants. To see how, consider the general moment formula of the free multiplicative convolution between a and cc * , which is the square of a quarter-circular element q 2 . Consider the general moment formula of the free multiplicative convolution (2.39),  A compound Poisson distribution is a canonical example of free infinitely divisible distributions, which are probability distributions that can be decomposed as 42 for any k ∈ N. We can then make use of µ 1/k to define any fractional convolution power µ l/k as µ l 1/k . By continuity, we obtain a one-parameter family of spectral distributions, (µ t ) t>0 , that actually forms a -semigroup under the associative action of the free multiplicative convolution , µ s+t = µ s µ t .
Conversely, given a such a semigroup (µ t ) t>0 , any element µ t is infinitely divisible. Hence, free infinite divisibility is equivalent to the existence of a -semigroup for t ≥ 0. Therefore, we see that while every compactly supported probability measure belongs to a -semigroup with t ≥ 1, the ones with the extended parameter range t > 0 are very special. See Chapter 3 in Hiai-Petz [68] for a good discussion of free infinite divisibility.
It is then evident that the R-transforms of µ t are simply multiples of the R-transform of µ 1 = µ, It turns out that one can be more explicit about the R-transform of an infinitely divisible µ. It admits the following form, known as the free Lévy-Khintchine formula, 43 for some positive finite measure η on R. Rather than being analytic only for small |z|, such R µ (z) is analytic on both C ± , and it maps C ± → C ± . In particular, any compound Poisson distribution µ λ,ν can be decomposed as and the semigroup of distributions is defined as µ t := µ tλ,ν . Using the cumulants (2.65). Its Rtransform reads which admits the form of the Lévy-Khintchine formula (2.70) with κ 1 = λm 1 (ν) and η(x) = x 2 λν(x).
In sum, thanks to the free infinite divisibility, the R-transforms of the free compound Poisson distributions are relatively simple to work with. Unlike the free Poisson distribution, a free compound Poisson distribution doesn't generally have a closed-form expression, but a closed-form expression for the R-transform is already good enough for many purposes. determined by the JT gravity+EOW brane theory. We need to sort out some technical issues in order make them match. Then we use the tools from free harmonic analysis to evaluate the convolution.
We shall also infer from the convolution formula a random matrix model that allows us to compute any spectral functions of the radiation density matrix. Surprisingly, it fits perfectly with Page's original proposal addressing how an evaporating black hole should be treated information-theoretically. To support the validity of the free convolution formula, we perform some consistency checks with some known results in the literature. Finally, we show how to re-formulate the entropy formula such that it is free of free probability, but resembles Wall's formula for the generalized entropy.

Replica trick as a free multiplicative convolution
We would like to match (1.8) with (2.39). The main idea is that the collection of gravitational partition functions {Z n }, obtained from the connected replica wormholes, should be viewed as free cumulants rather than moments.
These Z n partition functions are defined by the boundary condition that consists of alternative JT asymptotic boundaries and the EOW branes. They are computed by PSSY based on an earlier calculation by Yang [69]. The results read where y(E) is a Boltzmann-type factor that is influenced by the EOW brane, and ρ(E) is the density of states. After normalization with Z n 1 . We can rewrite it as the n th moment of some distribution, x n dx =: whereẼ(y) is the inverse function of y(E), x := y/Z 1 , E(x) =Ẽ(xZ 1 ) in the second equality and lastly we've defined a distribution function Note that y < 0 as y(E) is a decreasing function in E. Note that ρ(E) is not integrable so is ν b whose zeroth moment diverges at x → 0, so ν b is not a probability distribution, but only a positive measure over [0, x(0)]. The Bekenstein-Hawking entropy of the JT black hole + EOW brane system is given by In other words, the BH entropy is the entropy of the measure ν b . In particular, µ 1/β, we have It's more handy to work with a probability distribution, so we let ν b N be an regulated probability distribution support on [x(E N ), x(0)], and where D 1/N is the dilation operation defined in (2.46), E N is determined via the constraint 44 we have the weak convergence of the distributions, 45 The partition function Z n is evaluated in the large N limit, which poses some difficulties for us to treat it probablistically. We thus work with the regulated probability distribution ν b N and take the large N limit in the end. We think of this procedure as undoing the large N limit. Though we try to be concrete in (3.6), we shall see that the details of the regularization is not relevant for us, and any other regularization scheme can do equally well. The only requirement we demand is that the moments should converge to the moments of ν b large N limit. That is, for which follows from (3.7) for our regulated ν b N . Recall now the replica trick GPI (1.8), (3.10) The moments 1 k Tr (kR) n of the given bulk radiation density matrix R define a probability distribution ν r . Denote the spectral values of R as Let's use the momentsZ n /kZ n 1 to define a probability distribution νr, which we will soon confirm that it's indeed a valid probability distribution. The normalization kZ n 1 is to impose the normalization 44 This is also known as the double scaling limit in JT gravity [70]. 45 We say a sequence of distributions µ N on R converges weakly (or in weak topology) to some limiting distribution µ, if for any bounded continuous function f , we have that m 0 (νr) = m 1 (νr) = 1 as we want for the radiation density operatorr, which we shall discuss later. We therefore have the moments of νr, m n (νr) :=Z (3.12) We can also define the moments of the corresponding distribution νr N at finite N , and they satisfy m n (νr) = lim N →∞ m n (νr N ) . (3.14) Comparing the free multiplicative convolution formula (2.39) with (3.13), we see that νr N is indeed a valid probability distribution where the compound Poisson distribution µ N k ,ν b N follows from the free cumulants (N/k)m n (ν b N ) in (3.13), and the factor (k/N ) n in front gives rise to a dilation D N/k . In the second equality we put the dilation inside the second factor of the free convolution. 46 Note that the factors V ∈π (N/k)m |V | (ν b N ) in (3.13) are the moments of the distribution (N/k)ν b N which integrates to N/k. Since (N/k)ν b N has bounded support, it is the unique distribution with moments (N/k)m n (ν b N ). It is therefore not a probability distribution either. Nevertheless, the main point here is that they constitute a legit set of cumulants of a free compound Poisson distribution.
In the large N limit, N → ∞, the cumulants (k/N ) n−1 m n (ν b N ) converge so the limiting distribution of (3.15) is well-defined. We have We can further denote the limiting distribution in the bracket as and we obtain the result claimed in our introduction (1.9), that the physical radiation spectrum is given by the convolution of the spectral data of the gravitational sector and the matter sector respectively.
We should emphasize that the distribution µ ν b and thus the end result νr are independent of the way we regularize ν b in ν b N . We simply made an explicit choice in (3.6) but the details of such regularization is irrelevant. 46 This is legit because (νa ν b ) • D λ is the spectral distribution of λ √ ab √ a for some free random variables a, b and a scalar λ. Clearly, we can associate the scalar multiple to b (or a), yielding νa (ν b • D λ ) = (νa ν b ) • D λ . Now we have a justification for the replica trick as a well-posed moment problem. The νr defined in (3.12) is the legit probability distribution consistent with the moments given by the normalized partition functionsZ n /kZ n 1 in (1.8). Furthermore, µ ν b has compact support because both ν r and µ ν b are compactly supported. Hence, the replica trick is mathematically justified as a soluble Hausdorff moment problem which admits the unique solution.
Let us also quickly mention the results for a microcanonical ensemble with a fixed energy E ± ∆E, for which the partition function becomes much simpler, where S := log(∆Eρ(E)e S0 ) is the entropy of the microcanonical ensemble. It corresponds to the partition functions (3.13) in canonical ensemble νr N for N = 2 S and ν b N being flat. We denote the radiation spectral distribution as νr and its moments read The corresponding spectral distribution is which is a special case of the general result (3.15) we had.
Given the spectral distribution νr of the radiation density operatorr in the fundamental (boundary) description, its von Neumann entropy S(r) is given by (2.10), (3.23) and the Rényi entropies are given by (2.11), We can relate them to the (standard) von Neumann entropy of the k × k radiation density matrix R using (2.12), S(R) = S(r) + log k .
The convolution formula for entropy (3.23) supersedes the island formula and provides a more accurate account of the radiation entropy in the PSSY model. It applies to any entanglement spectrum of the bulk state. See Fig. 4 for an example. The Page curve can be obtained once we are given a sequence of input spectral distributions of the bulk state at different times.
Using the free multiplicative convolution, we can now resolve the spectrum even when the QES is not defined. Note that the generalized entropy is no longer relevant in our convolution formula (3.23). Islands and QES appear when the convolution factorizes. This happens when a particular summand dominates in the convolution expansion (3.12), yielding a factorized expression for the moments, and then the generalized entropy would show up in the entropy calculations. Such special cases hide away the more fundamental fact that the partition functions are described by a free multiplicative convolution between the quantum information encoded in the two competing QES. We will discuss when does the convolution factorize in Section 3.5.1.

Evaluating the free multiplicative convolution formula
We now want to evaluate free multiplicative convolution (3.17) via the S-transforms following Algorithm 2. As mentioned, this algorithm is tricky to implement for two arbitrary probability distributions. Fortunately for us, we can reformulate the calculation in terms of the R-transforms, which are better understood and easier to handle for the free compound Poisson distributions that we are dealing with. Therefore, translating everything to the R-transforms is the key to our derivation below.
We are eventually interested in the spectral distribution of (3.17), but let's first examine the case at finite N (3.15).
where we denote the distribution in the bracket as µ ν b ;N . We proceed by computing the S-transforms S νr and S µ ν b ;N . Compare the formal series of the moment function (2.35) and the R-transform (2.29), we have ω νr (1/z) = zR µ1,ν r (z) (3.28) where we use the fact that the free compound Poisson distribution has the moments of ν r as cumulants κ n (µ 1,νr ) = m n (ν r ). Let x := ω νr (1/z), we have Now consider the S-transform of S µ ν b ;N . Using (2.37), we have Then it follows that We would like to solve for the inverse function x(y), so to obtain the Cauchy transform via (2.35) defined for any y ∈ C + .
where we've used (3.33), and the functions z(y),z(y) are given by Now we have a system of equations only in terms of the R-transforms. We can solve for z(y) by plugging the first equation to the second and eliminate z(y), R µ ν b ;N [R µ1,ν r (z(y))/y] = yz(y) .
Plugging these formulas to (3.36), we have for any y ∈ C + , (3.38) Now taking N → ∞, we have the following equation that just depends on the two input distributions, ν r and ν b , corresponding to the moments feed into the replica trick (1.8), Note that this is a fixed-point equation for z(y) that is numerically easy to solve via an iteration algorithm, and the inputs to the algorithm are the distributions ν r and ν b that are ready to use. Using z(y), the Cauchy transform is given by The spectral distribution νr = ν r µ ν b can then be extracted using the Stieltjes inversion formula (2.25). We thus obtain the algorithm (1.10)-(1.12) claimed in the introduction. 47 We should remark that such random matrices are known as the separable sample covariance matrices, which have applications in multivariate statistical inference [24][25][26][27] and wireless communication [71]. They read, where X N is a N × M Ginibre matrix with variance 1/M , and the N × N matrix A N and the M × M matrix B N are deterministic and positive semidefinite. They are the sample covariance matrix of M samples of N data points packaged in the matrix √ A N X N √ B N . 48 We will find in the next subsection that the random matrix ensemble fundamentally describing the radiation stateR (3.51), as inferred from the convolution formula, indeed has the form of a separable sample covariance matrix.
The limiting spectral distribution of (3.41) is given by the same algorithm we presented. This result was previously obtained in random matrix literature under various weaker assumptions on the random matrix ensemble X N [24][25][26][27]. Here we've presented a novel and much simpler way to resolve the spectrum using free probabilistic methods. 47 As commented earlier in Section 2.4.3, the free multiplicative convolution is usually tricky to solve even just numerically. Nonetheless, our problem is still tractable without resorting to the machinery of the operator-valued free probability theory and the subordination formalism. 48 As compared to the sample covariance matrix (2.67) mentioned earlier (cf. Footnote 41), now we allow correlations B N among the M samples. However, these correlation is independent from the correlation A N among the N data points. More explicitly, consider the covariance matrix EYY * of the all the sample data in stacked in one N M × 1 vector . Then it follows that the covariance matrix is separable, EYY * = A N ⊗B N , and hence the name.

A model in free random variables
We now look for a density operatorr in some W * -probability space that has the spectral distribution of (3.17). We start by looking for the density operatorsr N with the the spectral distribution at finite N (3.15), and then obtainr as the limit.
Consider a collection of W * -probability spaces (W N , τ N ) N containing the following random variables r N , p N , b N , c ∈ (W N , τ N ), where p N is a projector free from the collection of the positive self-adjoint operator b N that has the spectral distribution of ν b N as in (3.6). 49 ; and c is a circular element free from all r N , p N , b N . 50 The positive self-adjoint operator r N live in the compression (p N W N p N , τ p N ), where τ p N := τ /τ (p N ) = (N/k)τ , and it is also free from p N . It has the spectral distribution of ν r N , which is defined as It's evident that τ (b N ) = τ (r N ) = 1, so they are density operators. We seek for a positive self-adjoint operator that shares the finite-N -spectral distribution (3.15). We postulate that, up to a unitary, 51 this operator is given bỹ Let's check if (3.43) gives us what we want. It is positive and has unit trace, where we used freeness to factorize the trace, and τ (cc * ) = 1. So it's a legit density operator. We have cb N c * ∈ (W N , τ N ) whose spectral distribution is governed by a free compound Poisson distribution µ 1,ν b N . It is then conjugated by √ r N yielding the spectral distribution, whose non-zero part gives (3.15).
To extract the solely the spectral distribution (3.15), consider now the p N that projects onto the support of r N andr N . Note that both r N andr N live in the subalgebra (p N W N p N , τ p N ). We have τ p N (r n N ) = m n (ν r ). The projected pr N p reads Since the projection p N acts as a free compression on cb N c * , we have p N cb N c * p N ∈ (p N W N p N , τ p N ) whose spectral distribution is µ N/k,ν b N • D k/N . This follows from (2.45). Then Note that we cannot assign mn(ν b ) as moments to any random variable because ν b is not a probability distribution. 50 As discussed earlier in the end of Section 2.3, the desired W * -probability space can always be constructed to accommodate these free random variables. We can actually be more concrete by realizing (W N , τ N ) as the free product of two Type I matrix algebras representing the gravitational sector and the matter sector respectively, (W N , τ N ) = M N (C) * M k (C). The former hosts the observables of the JT + EOW brane system at finite N and the latter hosts the observables of the k-dimensional radiation reservoir. 51 We can only determine the density operator up to a unitary transformation, because the replica trick only provides the information about the spectrum. which matches (3.15) as we want.
We now take the large N limit. Letr ∈ (W, τ ) be a random variable that has the spectrum νr (3.17). Consider the limit of the sequence of the freely compressed variables (p NrN p N ) N ∈ (p N W N p N , τ p N ) N ⊂ (W N , τ N ). 52 Then it follows from (3.17) that p NrN p N converges tor in distribution, 53 p NrN p N N →∞ −→r .
where we switched to the algebra (W N , τ N ) in the third equality and used freeness several times. It follows from the convergence thatr also has unit trace, so this is the density operator we want.
Let us now examine the complementary black hole system. Since the global state on the black hole-radiation is chosen to be pure, the black hole density operatorb N shares the same spectrum as the radiation system. We can flip the the operator ordering in (3.43) and posit thatb N is given bỹ which shares the spectral distribution (3.45) asr N . We should think ofb N as the density operator for the black hole at finite N . However, the large N limit makes it unclear if one can assign a density operatorb to the black hole that shares this spectrum. This is unlike how we getr which is defined with the help of a sequence of free compressions. In fact, we claim no sensible density operator exists for the black hole at the large N limit. Our claim is supported by the fact that in the large N limit, the distribution (3.45) has unbounded higher moments n > 1, indicating that there is not a meaningful density operator that the sequence of (b N ) N converges to. Similarly, we have ν b N → δ 0 at the large N limit, so there isn't a density operator that (b N ) N converges to either.
If it is the case that b N andb N do belong to the algebra of boundary observables at finite N , these limiting traces being either infinity or zero indicates that the large N algebra of boundary observables is no longer finite, such as a Type III von Neumann algebra or a Type II ∞ von Neumann algebra, so it ceases to qualify as a W * -probability space.
Nonetheless, physically we still expect that the entropy of black hole should be identical to the radiation entropy for a global pure state. Then how can we compute the black hole entropy directly fromb N ? We shall provide an answer later in Section 3.4.

A random matrix model
Physically, the radiation density operatorsr, p NrN p N , etc should be modelled as density matrices, for the radiation reservoir is simply a k-dimensional system. They are, however, fundamentally described as random density matrices following the intimate connection between free probability and random matrix theory. This supports the ensemble interpretation of the replica wormholes [6,[72][73][74][75].
On the other hand, the black hole system can be approximately described in a boundary random matrix theory [70]. Therefore, we can also meaningfully assign (random) density operators to model the random variables b N ,b N at least for finite N . In the large N limit, the operator algebra describing the boundary theory is presumably no longer a finite von Neumann algebra, so we lose the trace to define density operators. Therefore, let us stay at finite N and discuss the random matrix model that describes the outcomes of the replica trick calculation.
We shall assume that N is large enough N k to approximate any large N limit result. For instance, the value of S BH obtained from B N is arbitrarily close to (3.4). At finite N , we can also embed the given k-dimensional radiation density matrix R in M N (C), so we can discuss the black hole and radiation density matrices on equal footing.
The following random matrix ensemble models (3.43), 54 whereR N is the N × N boundary radiation density matrix; R N ∈ M N (C) is the embedded version of R ∈ M k (C) in an arbitrary k-dimensional subspace of H that represents M N (C); B N is a N × N diagonal random matrix (in an arbitrary basis) with the spectral distribution ν b N 55 ; and C N is a N × N Ginibre matrix with variance 1. As we've alluded to, the ensemble R N or P NRN P N has the form of a separable sample covariance matrix (3.41).
Similarly, the random matrix model of the black hole state at finite N (3.50) reads 56 The radiation density matrix at the large N limit is modelled by a k × k density matrix under the rank-k projection P N onto the support of R N . It is given bỹ (3.53) We claim thatR N ,B N ,R model in finite dimensions the density operatorsr N ,b N ,r as defined in (3.43), (3.50), (3.48), in the sense that the spectral distributions of the former is well approximated by the spectral distributions of the latter at large k and N . We picked the normalization that EB N ,R N andR are all normalized with the standard trace (Tr) to a good approximation at large k and N , so they can be viewed as density matrices. 57 We shall provide the justification in Appendix B.
The entropies ofR,R N ,B N are then also random variables on the underlying probability space (Ω, F, P ). For a separable sample covariance matrix, it is known that the almost surely spectrum of the random matrix converges (weakly) to the freely convoluted distribution νr [24,26,27]. Namely, a typical draw of sayR can have its spectral distribution arbitrarily close to νr with high probability at large enough k and N . Therefore, we shall use S(R) to denote both its expectation value and its typical entropy value. 54 Note that L ∞− (Ω) := ∞ p=1 L p (Ω) denotes the algebra of classical random variables that have all its moments bounded. It is larger than L ∞ (Ω) and in fact not a von Neumann algebra. We need to work with it instead of L ∞ (Ω) because the Ginibre matrix has Gaussian entries which have unbounded support but bounded moments. Therefore is only a * -probability space but not a W * -probability space. In the end of the day, since we only demand the convergence in distribution, it's legit that the limit is described by a finite von Neumann algebra albeit the random matrices are not. (cf. Appendix B). 55 More precisely, we say demand the empirical spectral distribution (cf. Appendix B) of B N to be ν b N . Here the basis can be chosen arbitrarily because the Ginibre ensemble is both left-and right-unitarily invariant. 56 Here we treat the Ginibre matrix C N to be independent from the ones used for (3.43). We do so because the replica trick GPI does not tell us howR andR N are correlated, so we will not care about their potential correlations. 57 Note that trB N = m 1 (ν b N ) = 1, so unlike R N , B N is not a density matrix in the usual normalization convention.
When R is maximally mixed, R N = P N /k, so we havẽ 54) where X N = P N C N /k is a k × N Ginibre ensemble with variance 1/k. It matches with the random matrix model proposed by PSSY in their Appendix D (eqn. D.10). In this view, the PSSY model with non-flat bulk entanglement spectrum simply amounts to conjugating the random matrix with the additional matrix R. Let us put the random matrix ensemble in a more familiar form in which a matrix model is usually written. According to PSSY, we can rewrite the large N limit of B N in terms of the random JT Hamiltonian H N found by [70] in the double scaling limit as follows, where we used (3.8) and x( · ) is the spectral function y(E) in (3.1) rescaled by 1/Z 1 , and the integral is over N × N Hermitian matrices H N with the measure controlled by the a sequence of potentials (V N (H N )) N that are determined by (ν b N ) N . 58 We can then compute any spectral function Ef (H) of the JT Hamiltonian via On the other hand, we can write the expectation taking over the Ginibre ensemble as For a rectangular k × N Ginibre ensemble X N , we have Say we are interested any spectral function f (R), such as the von Neumann entropy. Putting things together, it can be evaluated using the following matrix integral, where R is the given deterministic k × k bulk radiation density matrix.

A generalized Page's model
The above random matrix models can also be packaged together in a random tensor network. We now take a slight detour to introduce some basics of RTN. Generally, a RTN is used as a model for capturing the information-theoretic aspects of the AdS/CFT duality, and it is very successful in reproducing features like the Ryu-Takayanagi formula and the subregion-subregion duality [16,35,[76][77][78][79]. A RTN is defined for a given a connected undirected graph (V, E). For each edge-vertex pair (e, x), we assign a Hilbert space H e,x . Attached to each edge e = (xy) ∈ E is a bipartite system H e := H e,x ⊗ H e,y with dimension D e , which we refer as the bond dimension. An entangled pure state |φ e ∈ H e is assigned to each edge e ∈ E. Usually, they are chosen as Bell states. The vertices are Figure 5: The PSSY model as a generalized Page's model. The left tensor diagram shows the RTN for the PSSY model with non-flat entanglement spectrum, where the red part indicates the projection to a Gaussian random vector. The middle diagram shows that it generalizes Page's model by allowing the two edges to have non-flat spectrum. One can shift the operators all to the radiation system or the black hole system, so when tracing out either B or R, we obtain the consistent random matrix models deduced from our free convolution formula. The two phases described by island formula corresponds to make the blue cut or the orange cut which entail an entropy contribution of S BH or S(R) respectively. divided into bulk vertices V b and boundary vertices V ∂ . At each bulk vertex, we introduce a projection to a random Gaussian vector 59 |ψ x ∈ e∼x H e,x where e ∼ x denote any edge e connected to x. In this way, the RTN defines a state in the boundary Hilbert space, (3.60) Because of the random Gaussian projection, |ρ ∂ is only normalized in expectation with small fluctuations at large bond dimensions. The RTN of the PSSY model consists of two edges, one bulk vertex and two boundary vertices (cf. Fig. 5). The two edges states are given by where the Bell state |Ω has dimension N in both states but the Schmidt rank of them are |R| = k and |B| = N respectively. The density matrices B N and R N are diagonal in the basis used to define the Bell state Ω. They represent the purifications of the effective radiation density matrix R N , and of the effective black hole density matrix B N . If a minimal (QES) surface cuts through one of the two bonds, it entails an entropy contribution of S(R N ) = S(R) or S BH respectively. The random projection acts on the purifying systems R B , Then the RTN defines the state which equals to either of the following two expressions, where C N (or C N ) is a N × N Ginibre ensemble. The equalities follow from manipulating the tensor diagrams in Fig. 5. 60 Let's denote its marginal states asB N andR N respectively as usual for the fundamental description of the respective density matrices. It is evident that they match with (3.51) and (3.52) respectively. 61 This RTN is manifestly a generalization of the model proposed by Page to deduce the Page curve [10,11]. Page considered a microcanonical scenario, where the B N state is flat and log N measures the entropy of the remaining black hole. Also, R N is chosen to be flat. Then one simply has a random bipartite state on BR and Page considered the entropy S(R N ) as one vary the relative sizes between N and k. Basically, the RTN model we described generalizes Page's model by allowing the two bond states to be non-flat.
In this view, we should think of the two competing saddles as the competition between two "areas", with one given by the standard BH area term and the other given by the bulk entanglement between the brane and the radiation. The island formula describes the situation in which the radiation entropy can be described by a clean cut of one of the two bonds. This happens when the contribution coming from contracting the B system is separated from R, in the sense that the resulting spectrum ofR depends only on B N or R N but not both. We will later see how these separations are mathematically characterized by the one-shot entropies.
Generally, due to the non-flatness of the area spectrum, there isn't such a clean cut between the two sources of correlations. Nonetheless, the random Ginibre ensemble help set the two sources free, so we can exploit this freeness to calculate the spectrum of (3.51). This is the general takeaway message from our work. See more discussions in Section 4. To what extent freeness is relevant and how useful it is in AdS/CFT shall be explored in future work.

Hawking's surprisal
Hopefully we have convinced you that the free convolution formula for the radiation entropy (3.23) doesn't entail the notion of generalized entropy. Nonetheless, it can actually be interpreted as a generalized entropy if formulated in the right way. Consider the following formulation of the generalized entropy in Wall's proof of the generalized second law [29]. Wall equates the generalized entropy associated with any cut γ on the event horizon of a (classically) stationary black hole as follows, 62 where ψ is the quantum state of the matter field in consideration, ψ HH is the Hartle-Hawking state describing a black hole in thermal equilibrium, the relative entropy is evaluated for the subalgebra A γ associated with the spacetime subregion external to γ, and A(+∞) is the horizon area at infinity. It 60 When moving the operators from B to R or vice verse, they generally pick up a transpose defined w.r.t. to basis chosen in defining the Bell state. The diagonal matrices B N and R N are not affected, so it only affects the Ginibre matrix. Nonetheless, since the entropies of a Ginibre matrix is i.i.d., C N is the same random ensemble as any N × N Ginibre ensemble. 61 Note that the RTN also determines how the reduced random density states are correlated, but we will ignore this aspect of the RTN model as the gravity calculation does not inform us about it. cf. Footnote 56. 62 The formula is valid for a classically stationary black hole for which the contribution of the classical area increase is turned off, which is the interesting case concerning the generalized second law.
is independent of the state ψ so it drops out when we study the evolution of S gen along the horizon. The relative entropy term is the important one. Its change packages together the variations of the entropy of quantum fields and their black-reactions to the horizon area, and the generalized second law simply follows from its monotonicity under restriction to subalgebras associated with nested spacetime subregions [80][81][82].
The idea is that although the free convolution formula for entropy (3.23) does not have the form of an area piece plus a bulk entropy piece, it could nevertheless match with Wall's formula of the generalized entropy (3.65). This was also the idea recently used to show that in de Sitter spacetime, the entropy of any semiclassical state 63 is matching with the generalized entropy of the cosmological horizon [83].
We are interested in the relative entropy between the black hole stateb N and the black hole thermal state b N . Since We can interpret this quantity as Hawking's surprisal in learning the actual black hole stateb N , given his original belief that the black hole is basically a canonical thermal state b N . First of all, recall the definition (3.50) (3.66) Technically, since we do not have appropriately defined random variables for the black hole and radiation in the asymptotic limit, we consider the relative entropy S(b N ||b N ) at finite N , and then take the large N limit. At finite N , Umegaki's formula (2.17) of the relative entropy applies asb N and b N are density operators in a W * -probability space (W, τ ), The first term gives where we used the fact that theb N shares the spectral distribution withr N . The second term gives where we used the replica trick for the logarithm, substituted in (3.50), and used freeness to compute the traces; in the last step we used τ (r N ) = 1, τ (cc * ) = 1.
Together we have, where we added a pair of ± log N to each term to prepare for taking the large N limit. Now we take the large N limit. The second term in bracket gives, lim N →∞ S(r N ) + log N = S(r) + log N = S(R) , (3.71) 63 One can meaningfully assign an entropy to states because the operator algebra of observables in de Sitter is shown to be finite, more specifically, a Type II 1 factor. 64 Unlike entropies that are spectral functions, the relative entropy not only depends on spectra ofb N and b N , but also the relative phases between them. Hence, we have made a choice in definingb N to be (3.50) that fixes the relative phase betweenb N and b N .
where the last step follows from (2.12).
The first term in bracket gives, which is nothing but the BH entropy (3.4).

Altogether we have lim
The minus relative entropy on the LHS could be interpreted as the black hole/radiation entropy measured with reference to the black hole background, and the RHS says that it simply equals to the radiation entropy S(R) with the BH entropy subtracted. Put (3.73) differently, we have which is now a formula for the radiation entropy that doesn't explicitly involve free probability. Unlike the free convolution formula (3.23), (3.74) manifestly obeys the Bekenstein bound [84,85], i.e. S(R) ≤ S BH , also known as the central dogma of black hole physics [8]. This follows from the positivity of the relative entropy. This formula addresses the question we left in section 3.3. How one can make sense of the black hole entropy at the large N limit in terms ofb N which doesn't have a sensible large N limit? The answer is that we should simply compute Hawking's surprisal. Although bothb N and b N at large N are no longer legit random variables in a finite von Neumann algebra, the limiting relative entropy, lim N →∞ S(b N ||b N ), survives and remains finite.
The limiting relative entropy already resembles the term in Wall's formula, where the reference state is also the thermal state of the black hole. In order to make the match more manifest, we can try to be more explicit about the limiting relative entropy with a formal construction. Let the GNS Hilbert space constructed from the tracial state τ beH TFD . Any faith normal states admit a representation as a cyclic and separating state vector |1 . The state vector for the state which is a finite N version of the thermofield double state (i.e. the canonical purification of the thermal state). We can obtain other states from the action of the operators in W or its commutant Then various inner products are given by N a b N ), a ∈ W, a ∈ W (3.78) Another the state vector we care about inH TFD is the one forb N , Then we can write the relative entropy equivalently using Araki's formula (2.16), where ∆ψ N |ψ N is the relative modular operator. Now we follow a standard procedure to define the Hilbert space H TFD at the large N limit. A pedagogical description is given in Section 3 of [86]. The basic idea is to apply the GNS construction for the state ψ HH obtained from the limits of the finite N counterparts ψ N . In particular the inner product between a|ψ HH and b|ψ HH are obtained from the limit of the corresponding finite N inner products ψ N |b * a|ψ N = ψ N (b * a). ψ HH is called the thermofield double state, also known as the Hartle-Hawking (HH) state for a thermal black hole. Then we complete the algebra W to a von Neumann algebra W acting on H TFD in the weak topology. Unlike W, the so obtained large N algebra A is generically infinite.
We can also repeat the procedure to define the large N limit of |ψ N . The constructed GNS Hilbert space would be the same as H TFD , as long as for any a ∈ W there exists some b ∈ W such that the inner product aψ N | bψ N has a finite large N limit. We shall assume that this is the case.
Since all the inner products in H TFD are defined via the limits, we have In the PSSY model, the JT black hole is stationary so S BH can be matched with the constant term A(+∞)/4G N in (3.65). Therefore, our formula (3.82), or the less explicit (3.74), matches with Wall's formula (3.65) and thus can be viewed as a generalised entropy. Just like (3.65), the more relevant is the relative entropy term if we only care about the evolution of the radiation entropy, as in the Page curve for example. Here the relative entropy term contains implicitly the free convolution formula in its first argument.
For more realistic evaporating black holes, matter fields back-react on the geometry creating area fluctuations. This is not captured by the PSSY model, and would make the GPI analysis much harder. Therefore, being able to evaluate the exact radiation entropy of a realistic evaporating black hole using free multiplicative convolution is perhaps too good to hope for. The upshot of the relative entropy formula (3.82) is that it might be applicable in more general settings.

Sanity checks
We now perform some sanity checks to our result to show its consistency with some other relevant results in the literature.

The island formula and the one-shot entropies
When is the island formula (1.5) valid? The general answer was proposed by AP, who claim that the QES prescription in AdS/CFT is only valid in regimes governed by the one-shot entropies [16]. These entropies characterize information processing tasks in the one-shot regime where the only one copy of the source is involved. This is in contrast to how the Shannon/von Neuamnn entropy is relevant in the asymptotic regime where many identical and independent copies of the source are assumed.
In the context of the island formula in the PSSY model, the relevance of these one-shot entropies was first discussed in [17]. The main point is that the island formula only holds when a black hole is still young such that the smooth min-entropy of its emitted radiation (in the effectively description) is below S BH − O( √ β), or when a black hole gets old enough such that the smooth max-entropy of emitted radiation (in the effectively description) is above S BH + O( √ β). We have shown that the free convolution formula offers an accurate value of S(R) in any regime. Therefore, the convolution formula should reduce to the island formula under the above one-shot entropic conditions. We have observed this phenomenon in Fig. 4 where the transitions occur at S BH = 15, corresponding to the min-entropy, and S BH = 35, corresponding to the max-entropy. These transition are further smoothed out by the finite temperature corrections. Our goal in this subsection is to show that the convolution factorizes precisely under the above mentioned entropic conditions. Instead of directly comparing the size of each summand to single out the dominant term, we resort to the generalized Page's model introduced previously that realizes this spectral distribution. 65 We first introduce the one-shot entropies, which for us specifically refer to the smooth min-entropy and the smooth max-entropy. 66 where B ε A denotes the ε-ball around a density matrix A in the purified distance. 67 The non-smoothed min/max-entropies are the two extreme points of Rényi α-entropies for α = ∞ and α = 0 respectively. For an N × N density matrix A, both S ε max (A) and S ε min (A) are upper bounded by log N . Generally, one needs to deal with the conditional smooth entropies, but the setup in the PSSY model is relatively simply in that there are no quantum side information to condition on. So the above defined one-shot entropies will suffice for our purpose.
The operational relevance of these one-shot entropies was first pointed out in [88]. 68 The smooth min-entropy measures the maximum amount of randomness, which is ε-close to uniform, that can be extracted from a single copy of an N ×N density matrix A. More concretely, a random projection Π min of dimension smaller 2 S ε min (A) can typically produce a uniform outcome, i.e. N Π min AΠ min ≈ Π min , 69 hence extracting S ε min (A) bits of perfect randomness from a non-perfect source of log N bits. 65 This strategy is the same as how AP argued the relevance of the one-shot entropies in the corrected QES prescription [16] using RTN and the decoupling theorems, whereas a direct comparison among the saddle contributions is carried out in a replica trick GPI derivation in [17]. 66 The max-entropy is usually defined with the Rényi-1/2 entropy rather than the Rényi-0 (Hartley) entropy that we used, as the former is preferred for the duality relations that are particularly useful when dealing with the conditional versions. We stick to the latter for its better intuition as measuring the rank of the state. In any case, their difference is often small after smoothing. cf. Tomamichel [87] for a comprehensive review. 67 The purified distance is a fidelity-based metric between density matrices defined as δ(A, B) := 1 − F (A, B) where the fidelity is defined as F (A, B) := (Tr √ AB √ A) 2 . For our purposes, the trace distance would work just fine as well. 68 See the more comprehensive discussion including the conditional one-shot entropies in [89]. 69 It also follows from the special case of a one-shot decoupling theorem (cf. Lemma 4.5 in [90]) applied to a scenario without the reference system.
On the other hand, we can faithfully compress a copy of the N -dimensional state A into a "smaller" state with dimension 2 S ε max (A) up to an error of O(ε). More explicitly, we can project the state to the spectral support on the largest 2 S ε max eigenvalues, such that the projected state Π max AΠ max (with appropriate renormalization) remains ε-close to A. Note that such a projector Π max commutes with A. Since the projected operator has its trace ε-close to 1, we shall neglect the normalization issue here.
Consider now the random density matrix for the radiation (3.51). When S min (B N )−1 > S ε max (R), for two independently chosen smoothing parameters and ε, we can find a projector Π with rank M such that S min (B N ) − 1 > M > S ε max (R) and which follows from M > S ε max (R). Note that R ∈ (M k (C), Tr) and R N ∈ (M N (C), Tr) share the same values for any entropy. It follows that where we replace each R N by Π √ R N Π and use the polar decomposition C N = U |C N | with a Haar random unitary U (cf.(2.59)). Then we have ΠU |C N |B N |C N |U * Π ≈ Π under a random projection ΠU when S min (B N ) − 1. To see how the condition follows, recall that the min-entropy depends only on the largest eigenvalue. Since |C N | 2 /N = C N C * N /N asymptotes to a squared quarter-circular element has its spectral support over [0, 2], the min-entropy S min (|C N |B N |C N |/N ) 70 is only different from S min (B N ) by one bit, and so is the smooth min-entropy close by continuity. This explains the subtraction by one that we need to include in the condition above. 71 Therefore, we have so a young black hole satisfying the condition S min (B N ) − 1 > S ε max (R) obeys the island formula (1.5). Generally, the approximation error that (3.85) entails is set by the smoothing parameter O(ε).
When S max (B N ) < S ε min (R), we simply invert the argument above and apply it to (3.52). There exists another projector Π with rank M such that S max (B N ) < M < S ε min (R) − 1 and which follows from M > S max (B N ). It follows that

89)
70 The positive operator |C N |B N |C N |/N is normalized to a good approximation at large k, so we can treat it as a density matrix. 71 This order-one effect is somewhat universal. It already appears in Page's original calculation as the leading correction term [10]. so an old black hole satisfying the condition S max (B N ) < M < S ε min (R) − 1 obeys the island formula (1.5). Again, the approximation error that (3.88) entails is set by the smoothing parameter O(ε, ).
This is the general result addressing when one can compute the entropy of a boundary reduced state by putting a minimal cut as in Fig. 5 in the generalized Page's model, without invoking the free convolution formula. While we assume the density matrix R to be arbitrary, we do know more about B N which is only parameterized by the temperature (assuming the brane tension is large µ 1/β). In fact, B N essentially behaves like a projector of dimension 2 SBH in the following sense. In the PSSY model, the relevant parameter is β. The smooth min/max-entropies of B N are roughly (3.90) for ∼ O(poly(β)). This nearly flatness is in fact a generic feature of a holographic state, such as a thermal black hole state. Its smooth min/max-entropy is close to its von Neumann entropy, which is of order 1/G N , up to a subleading 1/ √ G N correction [91]. In the PSSY model, the temperature plays the parametric role of G N , and with dimensional analysis one can artificially introduce Newton's constant as β → βG N .
We thereby establish our claim made in the beginning of this subsection. The value of the smoothing parameters ε and can be chosen freely so that one obtain the most useful estimate of the entropy S(R). For B N , a good choice is something like ∼ O(poly(β)) that gives the estimate (3.90) up to an error of O(− log β). Such finite temperature effect that smoothes out the Page transition is well understood [6,[92][93][94]. The correction is of order O(1/ √ β), which is a subleading effect as compared to the case of leading corrections ∼ O(1/β) when the gap between the min-entropy and the max-entropy of the bulk radiation state is large as considered in [16,17].
While this finite temperature correction is usually treated in parallel to the corrections due to nonflat bulk spectrum, here we see that it's really the same phenomenon of enhanced replica symmetry breaking due to the non-flatness. We treated them on equal footing because a RTN models them equally. It's just that the non-flatness of the thermal spectrum is mild so it only gives a subleading correction to the island formula, whereas the spectrum of the bulk state could set made arbitrary by quantum processing.

The result of PSSY
As a second sanity check, we now reproduce the result concerning the radiation spectrum in the original work of PSSY. They considered a maximally entangled bulk state. ν r = δ 1 and m n (ν r ) = 1, ∀n, so the operator r is the identity. Then the radiation spectrum is given by the gravitational sector only which is simply a compound Poisson distribution up to a rescaling. Its R-transform admits the general form of the Lévy-Khintchine formula (2.72), (3.92) The large N limit gives, To extract the distribution, we resort to the Cauchy transform via (2.31), which, up to the normalization convention used, matches with PSSY's equation (2.34) that is derived from the Schwinger-Dyson equations for the resolvents.
To extract the spectral distribution, one needs to solve for the Cauchy transform. However, even in this case, a closed-form expression for the spectral distribution is not possible. PSSY offered an approximated solution and we now try to make a similar approximation by deriving it directly from our general convolution formula.
The approximation starts by working with the regulated probability distributions. Let us set N = k in (3.13) as a first approximation, where we used a simple approximation in the second line that π = η dominates the sum. This is because |π| ≤ |η| = n and n → m n (ν b k ) is increasing for large k. In the last step, we used (3.6) and E k is defined via In order to make an explicit comparison with PSSY's result of the radiation spectrum, we need to switch to their normalization convention, (3.97) where D(x) denote the spectral distribution of the radiation state in their convention.
On the other hand, we have so it follows that where we used the properties of delta functions to bring it to a form that matches with PSSY's equation We remark that we used a different regularization scheme from PSSY in making the approximation. Instead of using a smooth integrable distribution ρ k as in (3.6), they simply truncate ρ E at E k when E k 0 ρ(E) = k and replaces the contribution from E ∈ (E k , +∞) by a delta function that gives a lowest possible spectral value λ 0 . In our scheme, λ 0 is replaced by x(E k ). Nonetheless, the regularization schemes only differ at large E for which x(E) is in a small exponential tail. The qualitative result should not depend on the exact scheme used in making the approximation.

The result of Cheng et al on random tensor networks
We mentioned in our introduction that a similar result was obtained recently in the context of random tensor network by Cheng et al [35]. They considered a generalized RTN model where the edges are no longer maximally entangled but rather having arbitrary entanglement spectrum, modeling the area fluctuations that one would have in AdS/CFT. In the case of there are two competing minimal cuts γ 1,2 , the transition between them is no longer sharp. The details of the entropy of the reduced state on Γ ⊂ V ∂ depend on the spectra µ 1,2 of the bonds cut through by γ 1,2 . This is essentially the same phenomenon that we are investigating for the PSSY model, as was made obvious in the previous subsection. There is, however, one conceptual difference between our starting points and it is a somewhat pedantic. The RTN is by construction finite dimensional and the above mentioned result is formulated for a sequence of RTNs with increasing bond dimensions and converging spectral distributions at the two cuts. The free probability description is thus invoked via Voiculescu's theorem. We, however, do not have the convenience of a random matrix/tensor model as the fundamental description. Therefore, we take the first-principle rationale that GPI in the planar limit is the fundamental description, and then we do not need to worry about the convergence issues. Technically, this means we shall understand the GPI in terms of free random variables rather than random matrices.
One can certainly question the validity of the planar limit, and indeed it is, after all, a compromise for the analytic tractability of the problem, because we do not have a controlled way to account for the small fluctuations due to the nonplanar diagrams. So we opt for the pragmatic approach to work in the planar limit, and we think it's conceptually cleaner, because we can translate the entire GPI calculation into the framework of non-commutative probability theory.
Alternatively, if one takes the random matrix model, which can be deduced from the relevant free random variables, as the fundamental description for the PSSY model, one has essentially the same setup as the RTN model in Cheng et al. Then the free probabilistic calculation only serves as a leading order approximation. Heuristically, how well it approximates a finite dimensional RTN also gauges how well justified the planar approximation is.
Besides that, our result in Section 3.1 and their results in Section 3.2 are almost identical. At the first sight, it might not be obvious that the results are actually identical. The main technical difference comes from that we are dealing with the thermal distribution at the large N limit, which is itself not a proper probability distribution. Nonetheless, if we focus on our result at finite N (3.15), then a comparison can be made. Consider the operator we had (3.43), whose non-zero part gives (3.15), We can reorder the product without changing its moments into the following product of two free factors r N · cb N c * (3.101) where r N has the spectral distribution ν r N (3.42). The second term shares the spectrum with b N c * c = b N q 2 , where we again have a product of two free factors. Recall that a squared quarter-circular element q 2 has the distribution (2.56), which we denoted as MP(1) following Cheng et al. b N has the spectral distribution ν b N (3.6). Then the moments of (3.100) are shared with the moments of and its moments define the distribution which matches exactly with their Theorem 3.4 in [35].

Takeaway: Freeness is good
We presented a concrete solution to the PSSY model with non-flat bulk entanglement spectrum using ideas from free probability theory. We showed that the free probabilistic framework is perhaps the most appropriate language to understand the model, bridging between the replica trick GPI calculation on one hand and the random matrix model on the other. Aside from the solution to this particular problem, the main conceptual takeaway from our work is that freeness is a useful feature that can be exploited in studying how gravity processes quantum information, in parallel to the more familiar notion of independence. Though certainly more subtle and underlying than tensor independence, it is worth the attention. By making the PSSY model a bit more complicated, we are able to see freeness manifest in gravity that is otherwise hidden when we only consider a flat entanglement spectrum. As we mentioned, this has to do with a mathematical fact that the identity is trivially free from any other random variable, and this is the only situation where freeness and independence coincides. Therefore, a flat entanglement spectrum with the reduced density operator being an identity hides the fact that the degrees of freedom associated with two candidate surfaces in transition are not always independent but rather always free.
Generally, it is hard to characterize quantum correlation in gravity just like in any generic manybody quantum system. We usually rely on the independence between subsystems in order to make definitive statements. For instance, the reference that encodes any information that falls into a black hole is decoupled, i.e. independent from, the radiation or the remaining black hole before or after the Page time respectively. This allows us to tell where is the information encoded so we can in principle retrieve it from there.
Over the past, we have made much progress in understanding how gravity stores and processes quantum information, starting from the seminal work of Page [10,11] and Hayden-Preskill [95] towards the more recent quantum error correction with complementary recovery [96][97][98], (random) tensor networks [76,78,91,[99][100][101] and entanglement wedge reconstruction as state-merging [16]. From an information-theoretic perspective, an overarching technical core underlying these results is the decoupling theorems [90,[102][103][104], which characterizes when do we have independence between certain subsystems. Then Uhlmann's theorem [105] would tell us how the quantum correlations is constrained so that we can characterize the whereabouts of the information. Another example is the quantum de Finetti theorem [106][107][108][109], which uses the permutation symmetry of large collection of systems to deduce the independence among subsystems. This idea is recently used to explain the tension between the recent replica trick calculation of the Page curve and Hawking's original calculation [110].
On the other hand, as we have emphasized in Section 2.3, freeness is a distinct notion of independence. In contrast to the standard tensor independence, that concerns commuting random variables, freeness is a notion of independence tailor-made for non-commuting random variables. Technically, it is defined via the free product instead of the tensor product. As for the decoupling theorem, de Finetti theorem and Uhlmann's theorem, freeness comes with its own tricks that we can utilize. There is a whole package of free probabilistic tools that one can use to quantitatively characterize the information, as we have seen them in action.
More importantly, freeness and independence are not mutually exclusive in applications as they work under different conditions and concern different objects. The work by Cheng et al is another good example of this. While freeness can be used to give precise results when the spectral distributions at the candidate minimal cuts converge, one can still make some useful claims about the boundary reduced state using the decoupling approach when the convergence is not assumed (cf. Section 4 in [35]).
From the RTN perspective, the intuition is that the random tensors not only facilitate the decoupling among different bulk subregions, but also set the information encoded in different QES free from each other. It would be interesting to explore the possibility of putting them to work in synergy in future works.
In quantum theory, which is usually treated as a non-commutative probability theory of observables, we are used to the notion of independence as captured by commuting observables. Now we have seen that it's also possible and fruitful to accommodate freeness in quantum theory, and especially quantum gravity. We hold the vision that freeness can be very useful in quantum gravity. One hint comes from the relevance of the free independence between two perturbation of a black hole separated longer than the scrambling time [111]. We are also largely motivated by the fact that various recent GPI calculations call for random matrix models [70,[112][113][114]. In fact, people already came across free random variables in studying the large N matrix models [115][116][117]. Free probability is very relevant here because of its intimate connection with random matrix theory, building on the remarkable insight of Voiculescu that independent random matrices tend to be free in the large N limit. The theory of higher order freeness has also been developed to study the fluctuations of random matrices [118][119][120][121].
We therefore believe that freeness should be ubiquitous in gravity, provided that we look in the right directions.

A Additivity violation of the minimum channel output entropy
In this appendix, we discuss how an old black hole encodes a channel that violates the additivity of minimum channel output entropy.
In quantum information theory, understanding the quantum channel capacity has fundamental significance. It provides us fundamental insights regarding how much advantage that quantumness can offer over classical resources. One specific problem concerns the capacity of quantum channel for transmitting classical information. People want to know if entanglement can help given the access to multiple copies of the channel. It is conjectured that quantum entanglement does not enhance the channel capacity and the classical capacity of quantum channel is additive. The additivity is verified for many concrete channels but remains an open problem until a counterexample is found.
The conjecture is not proved false directly. The counter example concerns an equivalent problem of the additivity of the minimum channel output entropy, which we shall now briefly introduce. The Shor showed that asking about channel additivity is equivalent to, among several other equivalents statements, asking if S min is additive for any two channels N , M [30].
This is proved false by Hastings [31] who showed that for some N -dimensional channel N and its conjugate channelN . 74 We've been using free probability to analyze the black hole information problem, whereas randomized constructions are in fact ubiquitous in quantum information. The rationale of injecting randomness dates back to Shannon's founding work of proving the channel coding theorem using a random coding [122]. Hastings' counterexample is also a random construction, and the existence of additivity violation is argued probabilistically. He didn't find an explicit channel that gives the violation, but rather N is a random unitary channel of form with independently sampled Haar random unitaries U i , weighted by some carefully chosen probability vector p i that is roughly uniform. Then Hastings showed that in the regime of 1 d N , some (or in fact any typical) instance of the random channel ensemble N violates additivity as in (A.3).
Hastings' work was built upon the earlier work of Hayden-Winter whose proved false the Rényi versions of the conjecture [123], where Hayden-Winter used a similar random construction in the Stinespring representation with random isometries.
There are many follow-ups to the Hastings-Hayden-Winter (HHW) breakthrough [32,[124][125][126][127][128], but they almost all use the same family of random channels with minor variations. Unfortunately, no concrete counterexamples nor extensive violations are found.
Let us mention a particular variant of the HHW-type construction that is been analyzed using free probability [32]. 75 Consider the following random channel mapping from the N -dimensional quantum state space to itself, where we have n independent N × N Ginibre matrices C i , and T is a rectification operator that ensures the channel is trace-preserving. Note that T almost surely converge to identity at large d, so the rectification is a technicality that is only relevant for finite d and can be approximately ignored at large d. 72 Here S min should not be confused with the min-entropy. We apologize for the clash of notations. 73 This is a convex optimization problem so the inputs are restricted to pure states w.l.o.g. 74 The conjugate channel of a channel N ( · ) := i K i ( · )K * i is defined asN ( · ) := iK i ( · )K * i . 75 I thank Tony Metger for pointing out this reference to me.
In the regime it's shown that typical instances of the random channel E (together with its conjugate channelĒ) violates the additivity the with a small gap [32]. The proof is built upon the earlier work by Collins [127]. Note that (A.5) is a variant of the HHW-type counterexample because Ginibre matrices can be thought of as partially-traced Haar random unitries/isometries (cf. [127] and the discussion in section 4.1 in [32]). Note that the output of E has a nearly flat spectrum. (cf. Theorem 3.1 in [127] for a precise statement. ) This is a universal feature of the HHW-type construction. The output of the channel is very close to a maximally mixed state of the maximal possible rank. 76 It's somewhat ironic that the only family of channels that we know to have superadditive capacity has almost no practical use.
Consider now the spectrum of Hawking radiation in a microcanonical ensemble for a very old black hole. Namely, we are in the regime where as assume k/2 S is an integer. The microcanonical spectral distribution of the radiation reads (3.22), Now we look for a quantum channel that takes bulk radiation density operator R as input and the output has its spectrum described by νr to a good approximation at large dimension N . Following the discussion in 3.3.1, a density operator with this spectrum is where c is a circular element and p is a free compression of trace τ (p) = 2 S /k. However, when we modelr in random matrices, it doesn't take the form of a trace-preserving channel from r tor. We need to consider something else. Let's restrict to the non-zero part of νr and ask what channel has output spectrum described by µ k/2 S 1,νr up to the normalization. This is the spectrum of k/2 S i=1 c i rc * i for k/2 S free circular elements {c i } i∈[k/2 S ] . We can put in a normalization factor to obtain a completely positive trace-preserving map on (W, τ ), Using again the connection between random matrices and free probability, we can model E( · ) by E( · ) in (A.5) at large N . Then in the regime (A.7), which is identified with (A.6) through k/2 S = d, k = N , one would find that this channel violates the additivity of minimum output entropy.
Therefore, we conclude that the HHW-type random channel (A.5), which is originally constructed purely for technical convenience in approaching the additivity conjecture, can be physically encoded by an evaporating black hole at late time. However, it still begs the question what is the physical meaning 76 It means that either the channel itself or its complement channel is close to a fully depolarizing channel. It is known that a channel is additivity-violating if and only if its complement channel is also additivity-violating [129]. of the channel x → E(x). Could it be interpreted as a holographic map from bulk to boundary in some appropriate sense?
It's important to remark that this violation is only visible here because we sum over all possible geometries, albeit that most of them are subleading and usually deemed unimportant at late time. If we only care about entropies, the island formula is good enough as the GPI at late time is dominated by the symmetric replica wormhole saddle alone. This is in accordance with the fact that the channel E is only slightly different from a depolarizing channel that outputs a maximally mixed state of rank d = k/2 S . However, it is exactly this subtle difference that is responsible for the violation in the HHW counterexamples. Therefore, in order to see the additivity violation from an old black hole, taking replica symmetry breaking into account is indeed indispensable.
It's plausible that realistic black holes at late time encode information via a more sophisticated (random) channel such that one can hope for more counterexamples to additivity or even large extensive violations. This possibility is indeed contemplated in [130] by relating it to the existence of certain entanglement properties of black hole microstates. However, as we learnt in the PSSY model, one perhaps needs to wield the full GPI in order to extract such channels, which remains technically challenging for realistic black holes.

B More on random matrices
In this appendix, we introduce the notion of random matrices and their spectral distributions. We shall see how random matrices make contact with free probability.

B.1 Preliminaries
We have the probability space given by a Kolmogorov triple (Ω, F, P ). The sample space Ω is equipped with some σ-algebra F and a probability measure P . We can simply take the F to be generated by the relevant random variables. Consider the matrix elements of an N × N matrix A as random variables. They are defined as measurable functions, A ij : ω → A ij (ω), from Ω to C equipped with the Borel σ-algebra. Then we define the random matrix A as A : Ω → M N (C), ω → A(ω). We can view it as the joint random variable of its matrix elements (A ij ) N i,j=1 mapping from Ω to C N 2 equipped with the Borel σ-algebra. We can then pushforward P to define the probability distribution of the matrix elements ν(A ij ) on C.
We are particularly interested in the spectra of random matrices. For a N × N matrix A with eigenvalues {λ i (A)} N i=1 , we can defined its spectral distribution, which is a probability measure on C, as an average over the Dirac mass on each eigenvalue, Often we need to consider a sequence of random matrices of size N ∈ N, denoted as (A N ) N . For technical convenience, 77 we assume these random matrices are all defined on the same probability space (Ω, F, P ).
Let the eigenvalues of to be measurable functions from Ω to C, λ i (A N ) : ω → λ i (A N (ω)). Let the set of probability measures on C be M(C). Then we define the empirical spectral distribution (ESD) of A N as a measurable function from Ω to M(C), which is a probability measure-valued random variable. 78 For a random variable, we can discuss different notions of convergence as we send N → ∞. In particular, we need the notion of almost sure convergence. For a sequence of classical (scalar) random variables (X N ) N on C, it converges almost surely to some random variable X if Let us restrict to Hermitian matrices. For a sequence of ESD (ν A N ) N of hermitian matrices (A N ) N , we are interested in the cases that the sequence converges to a deterministic limit.
The strongest convergence is the almost sure convergence. We say a sequence of ESD (ν A N ) N converges almost surely to a measure ν a ∈ M(R) if for any bounded continuous function f : R → R, ν A N (λ)f (λ)dλ, which is a scalar random variable, converges almost surely to the value of ν a (λ)f (λ)dλ. In other words, we almost surely have the ESD (ν A N ) N converge to ν a in weak topology. Then we call ν a the limiting spectral distribution (LSD) of the random matrix ensemble For classical random variables, the almost sure convergence implies convergence in probability, which in turn implies convergence in distribution. 79 For a sequence of ESD, the convergence in probability is defined similarly as the almost sure convergence up to changing the sense of convergence, whereas the convergence in distribution should correspond to the notion of convergence in expectation. It concerns the expected ESD Eν A N defined as for all bounded continuous function f : R → R. Then we say (ν A N ) N converges in expectation to some probability measure ν a on R if the sequence of deterministic distributions (Eν A N ) N converges weakly to ν a , i.e.
for all bounded continuous function f : R → R. Of course, almost sure convergence implies convergence in expectation. In summary, for a sequence of ESDs, almost sure convergence implies convergence in probability which implies convergence in expectation.

B.2 Making contact with free probability
To see how asymptotic limit of large random matrices make contact with free probability, we would like to abstract away the Kolmogorov triple and resort to the non-commutative probability space 78 The notion of an ESD is perhaps confusing at the first encounter, essentially because we have two types of randomness put together. It would be conceptually more transparent when we work with a non-commutative probability space in section 2. 79 For a sequence of classical (scalar) random variables (X N ) N on R, it converges in probability if for any ε, lim N →∞ P (|X N (ω) − X(ω)| > ε) = 0; and it converges in distribution if lim N →∞ Ef (X N ) = Ef (X) for any bounded continuous function f : R → R, i.e. the sequence of distributions ν X N converges weakly to µ X .
(L ∞− (Ω) ⊗ M N (C), E ⊗ tr), which is a * -probability space (cf. Footnote 54). We now discuss the convergence of random matrices in terms of the convergence of non-commutative random variables.
Note that because ESD itself is a probability measure-valued random variable the convergence of ESD entails two notions convergences. For example, the almost sure convergence we've been discussing concerns the random variable itself, whereas we demand each instance to converge weakly as a probability measure. The latter is also known as the convergence in distribution for the random variables that carry these instances of probability measures. Since we've stashed the Kolmogorov triple and opted for (L ∞− (Ω) ⊗ M N (C), E ⊗ tr) to describe random matrices, we take the expectation and hence left the deterministic distributions Eν A N , the only notion of convergence available is the convergence in expectation, or convergence in distribution for the respective random variables in L ∞− (Ω) ⊗ M N (C). Concretely, for a sequence of random matrices (A N ) N ∈ (A N , τ N ) N , we can now talk about its limit at N → ∞ as some non-commutative random variable a in (A, τ ) as long as the moments match asymptotically. Equivalently, when the ESD (ν A N ) N converge in expectation to ν a , we can identify ν a as the (deterministic) spectral distribution for some a ∈ (A, τ ).
With the appropriate notion of convergence, we can now discuss the when do large random matrices become free from each other.
Definition 7 (Asymptotic freeness). A collection of sequences of non-commutative random variables (a N ;1 ) N , . . . , (a N ;d ) N ∈ (A N , τ N ) N is asymptotically free if they converge in distribution to some elements a 1 , . . . , a d respectively in some non-commutative probability space (A, τ ) such that the collection a 1 , . . . , a d is free.
When we choose (A N , τ N ) N to be (L ∞− (Ω) ⊗ M N (C), E ⊗ tr) N , asymptotic freeness implies that the ESD ν A N converge in expectation to ν a . Hence, asymptotic freeness for random matrices is a property defined at the level of expectations, capturing a relatively weak sense of asymptotic freeness. Asymptotic freeness does not only concern random matrices, due to the flexibility in choosing the non-commutative probability spaces (A N , τ N ) N . While one can choose (A N , τ N ) N to be (L ∞− (Ω) ⊗ M N (C), E ⊗ tr) N for N × N random matrices, we can also discuss asymptotic freeness for sequences of deterministic matrices in (M N (C), tr) N when they converge in distribution to elements in (A, τ ). We can exploit this to define the stronger notion of almost sure asymptotic freeness for random matrices.
Note that here we demand the elements in (M N (C), tr) under fixed ω ∈ Ω to converge in distribution for almost all ω, which implies the almost sure convergence of the ESDs. Therefore, almost sure asymptotic freeness is stronger than asymptotic freeness.
More explicitly, consider again (2.18) for any two sequences of random matrices (A N ) N and (B N ) N with each admitting an asymptotic spectral distribution ν a and ν b respectively. If for any finite collection of integers {m i , n i } i ≥ 0 such that the following random variable converges almost surely , E ⊗ tr) N , asymptotic freeness is equivalent to that any sequence of mixed cumulants, as defined via (2.20), vanish in the limit N → ∞; and almost sure asymptotic freeness is equivalent to that the mixed cumulants vanish almost surely in the limit N → ∞. Now we can now state Voiculescu's theorem, that is central in bridging random matrix theory and free probability. Let us revisit the matrix matrix model proposed in Section 3.3.2. We claimed thatR N andR model r N andr, in the sense that the ESD of the former is well approximated by the spectral distribution of the later at large fixed values of N and k. We want to argue it by leveraging Voiculescu's theorem 2. However, an obstacle is that we are not given a sequence of random matrices in (3.51) but rather a random matrix with fixed size. We do not have a sequence of random matrices so Voiculescu's theorem doesn't seem to be applicable here.
In order to make the contact with free probability, a standard solution is to "blow up" a finite matrix, say the deterministic k × k radiation density matrix R given to us, by tensoring an identity factor of dimension L, therefore creating a sequence of kL×kL matrices (R kL ) L := R⊗I L by adjusting L. 81 Note that the ESD of (R kL ) L remains the same and thus trivially converges to ν r as L → ∞. 80 A unitarily invariant ensemble means the joint distribution of the matrix elements is invariant under any unitary conjugation. 81 More generally, in a (random) matrix polynomial involving standard independent random matrix ensembles, such as GUE, GE, CUE, and some deterministic matrices. One simply replaces the deterministic ones by some random variables ({d i }) that share the same spectral distribution. Then the corresponding polynomial in {d i } and the free semi-circular, circular or Haar elements is called the free deterministic equivalent, and it serves as a good approximation at large N . This procedure is often used to deal with (random) matrix polynomials involving single deterministic matrices or nonconverging sequence of deterministic matrices. In our case, for example, r is a free deterministic equivalent of (R kL ) L . For a precise formulation of free deterministic equivalents, see Chapter 10.1 in Mingo-Speicher [21] and [64,131,132].
Consider now the matrix model (3.51). We can shift this factor N from C N to R N without changing the overall expression. Let's rewrite it as √ N R N C N B N C N √ N R N where we have EtrB N = 1, trN R N = 1 and C N has variance 1/N instead of 1 as in (3.51). Then we blow up N R N to sequences and R N L := N R N ⊗ I L . The Ginibre ensemble can be chosen to be of any shape so we can have the sequence of N L × N L Ginibre matrices C N L .
Since any unitary from left or right can be absorbed by the Ginibre ensemble, we can conjugate (B N L ) L by a Haar random unitary for free. Then it follows that (C N L ) L , (B N L ) L and (R N L ) L are all independently distributed and any pair consists of one ensemble being unitarily invariant. Also, each ensemble admits a LSD. All the conditions of Voiculescu's theorem are thus satisfied and the almost sure asymptotic freeness follows.
As sequences of non-commutative random variables in (L ∞− (Ω)⊗M N L (C), E⊗tr) L , we then have the convergence in the sense of Definition 6, (C N L ) L → c, (B N L ) L → b N and (R N L ) L → r N such that c, b N , r N are free from each other. This gives a concrete random matrix realization of the abstract non-commutative random variables we invoke in Section 3.3.1. In the next subsection, we discuss why the random matrix models we proposed capture the desired limiting spectral distributions.

B.3 Sample covariance matrix and separable sample covariance matrix
It now makes sense to ask for the LSD for any polynomial of sequences a collection of asymptotically free random matrix ensembles. When each random matrix ensemble admits a LSD, the answer is given by the distribution of the corresponding non-commutative random variables at the large N limit. If the random matrices are almost surely asymptotically free, then the ESD of any polynomial almost surely converges weakly to the LSD. For the precise statement and its proof, see Lemma II.4.1 in [60].
For example, we are mostly interested in the product of two asymptotically free random matrices. Given two almost surely asymptotically free sequences of random positive 82  The same also holds for the free additive convolution.
We now look at the sample covariance matrix ensemble that models the free compound Poisson distribution, and the separable sample covariance matrix that models our free convolution formula (1.9).
The free Poisson distribution is modeled by the Wishart ensemble. Let X N to be N ×M rectangular random matrices with i.i.d. complex Gaussian entries of zero mean and variance α/M . Consider now the Wishart matrix W N = X N X * N ∈ (L ∞− (Ω) ⊗ M N (C), E ⊗ tr) . We can use free probability to obtain this result. Since A N is independent from X N , X N is unitarily invariant and so is W N = X N X * N and both W N and A N admits the LSD's µ MP λ,1 and ν respectively, they are almost surely asymptotically free as implied by Voiculescu's theorem 2. The LSD of ( √ A N X N X * 83 This formula holds in general for the ESDs ν AB and ν BA of two rectangular matrices A and B with sizes N × M and M × N respectively. 84 This result was established by Silverstein and Bai for X N being the more general Wigner ensemble [133]. where we used the formula (2.39) in the first line; in the second line we used the formula of free cumulants (B.14) with α = 1; and in the last step we used |π| + |π| = 1 + n. Hence, the cumulants of the LSD is given by κ n = λ −1 m n (ν), which yields (B.18).
As for the sample covariance matrix (2.67), we can also consider the matrix ensemble that shares the same non-zero spectrum as (3.41), known as the separable sample covariance matrix where we have the a sequence of additional independently distributed random N × N matrix (D N ) N that admits a LSD ξ . C N is an N ×N Ginibre matrix. We assume w.l.o.g. that M = N by allowing D N to be rank-deficient. (D N ) N is almost surely asymptotically free from the sample covariance matrix C * N A N C N according to Voiculescu's theorem. It follows from the above result on sample covariance matrix that ν √ In terms of the non-commutative random variables, we have the following convergence in distribution, where (D N ) N → d, (A N ) N → a and c is a circular element in some W * -probability space (W, τ ).
Now we can verify that our matrix models in Section 3.3.2 correctly model the random variables in Section 3.3.1. We let D N be the rescaled N × N embedded radiation density matrix N R N , and A N be the black hole (random) thermal density matrix B N . Then the ensemble (B.21) matches exactly withR N in (3.51).
Based on the discussion in the end of last subsection, we fix N to be some large value and we use L for the varying parameter for the matrix size. We N R N to a sequence, and define a sequence that extends (3.51) to the large L limit. where at L = 1 we have the NR N . We need to put in the factor N to adjust the normalization. Let the free deterministic equivalent of N R N be r N (cf. Footnote 81), in the sense that (R N L ) L → r N . Then (B.23) implies thatR N (3.51) modelsr N (3.43), in the sense that (R N L ) L →r N . More explicitly, we have almost surely the sequence of ESD (νR N L ) L converges weakly to νr N . Similarly, the arguments apply forB N andb N .
Lastly, we consider the large N limit. We haveR N →R and p NrN p N →r. Because almost surely the sequence of ESD (ν P N LRN L P N L ) L converges weakly to ν p NrN p N . If for any bounded continuous function f on R, the convergence of the expectation value of f evaluated with the distribution (ν P N LRN L P N L ) L is uniform on N , we can then swap the limits to obtainR →r.