On the critical-subcritical moments of moments of random characteristic polynomials: a GMC perspective

We study the 'critical moments' of subcritical Gaussian multiplicative chaos (GMCs) in dimensions $d \leq 2$. In particular, we establish a fully explicit formula for the leading order asymptotics, which is closely related to large deviation results for GMCs and demonstrates a similar universality feature. We conjecture that our result correctly describes the behaviour of analogous moments of moments of random matrices, or more generally structures which are asymptotically Gaussian and log-correlated in the entire mesoscopic scale. This is verified for an integer case in the setting of circular unitary ensemble, extending and strengthening the results of Claeys et al. and Fahs to higher-order moments.


Moments of moments of random characteristic polynomials
There has been a good deal of interest in recent years in understanding asymptotically log-correlated Gaussian fields and their extremal processes. In the context of random matrix theory, these structures arise from the study of the logarithms of characteristic polynomials of different ensembles, such as Haar-distributed random matrices from the classical compact groups, the unitary ensemble of random Hermitian matrices and the complex Ginibre ensemble. This has led to a line of investigation into the so-called moments of moments, which capture multifractal properties of the underlying characteristic polynomial as well as non-trivial information about its maxima.
To motivate our discussion, let us consider the circular unitary ensemble (CUE). Suppose UN is an N × N Haar-distributed unitary matrix and PN (θ) = det(I − UN e −iθ ) is the associated characteristic polynomial. For k, s > 0, we denote by MoM U(N) (k, s) := E U(N) 1 2πˆ2 π 0 |PN (θ)| 2s dθ k the (absolute) moments of moments (MoMs) of PN (θ). We shall be interested in the behaviour of MoM U(N) (k, s) as N → ∞. Due to the multifractality of the random characteristic polynomials, this quantity exhibits a phase transition as one varies the values of (k, s), and it was conjectured in [25] that where G(s) is the Barnes G-function, and c(k, s) is some non-explicit constant depending on (k, s). Much progress has been made in establishing the asymptotics (1.1), especially when k, s are both positive integers, in which case combinatorial approaches are possible. In this setting, the pair (k, s) always 1 lies in the supercritical regime k > 1/s 2 , and the corresponding asymptotics was first verified in [6]. An alternative combinatorial proof was also given in [2], where a connection to the Painlevé V equation was established for c(2, s), s ∈ N.
For positive integers k and general s satisfying s > −1/4, most results have been established with the help of the asymptotics of Toeplitz determinants obtained by the Riemann-Hilbert approach 2 . The case where k = 2 was treated in [14], where the asymptotic formulae in both regimes in (1.1) were established alongside the connection of c(2, s) to the Painlevé V equation. As for general k ∈ N, the subcritical case s 2 < 1/k was fully established in [23], but only a weaker result MoM U (N) (k, s) = Ω(N k 2 s 2 −k+1 ) was obtained in the supercritical regime.
Common to all the aforementioned works is the crucial assumption that k is a positive integer, so that one can expand the moments and obtain (1.2) From there, the problem becomes the analysis of the asymptotics for the cross moment in the integrand as well as its behaviour as one integrates it over [0, 2π] k . It is not clear, however, how one may analyse the cross moment when θj's are arbitrarily close to each other (except when k = 2 [14]), let alone extend the investigation of moments of moments to non-integer values of k. Our goal here is to address this situation through the lens of Gaussian multiplicative chaos (GMC). Before we continue, let us briefly mention the works [1,7,8] where analogous moments of moments are studied in the context of classical compact groups, an approximate model for Riemann zeta function, and the branching random walk via the combinatorial approach, as well as the article [13] where the results in [23] are translated to Toeplitz+Hankel determinants, yielding similar weak asymptotics for classical compact groups.

Perspective from the theory of multiplicative chaos
The theory of GMCs studies the construction and properties of a random multifractal measure formally defined as the exponentiation of log-correlated Gaussian fields. First proposed as a model of turbulence in the 60's, GMCs have attracted a lot of attention in the last two decades due to their central role in the mathematical formulation of Polyakov's path integral theory of random surfaces, also known as Liouville quantum gravity [19,22]. The significance of this theory lies in the fact that it describes (at least conjecturally) the scaling limits of a wide class of models in statistical mechanics, and is closely related to other 'canonical' geometric objects in the continuum with conformal symmetries, see e.g. [21,44] and the references therein. Thanks to techniques from conformal field theory and complex analysis, finer results have been obtained in recent years including various exact formulae for Liouville correlation functions and related quantities [31,36,41,42].
In a different direction, GMCs have been used as a tool to study the geometry of the underlying fields, and new applications in other areas of mathematics have also been studied over the last couple of years. For example, it is now known that the characteristic polynomials of many different random matrix ensembles behave asymptotically like GMC measures as the size of the matrix goes to infinity [45,11,34,35,12,26,15]. We would like to exploit such connections to GMCs, and explore the implications of seemingly unrelated exact integrability results from Liouville quantum gravity in the context of random matrix theory.
For illustration, let us revisit our CUE example. It has been shown in [35] that when s ∈ (0, 1), the sequence of random measures |PN (θ)| 2s E U(N) |PN (θ)| 2s dθ (1.3) converges in distribution (and in the weak * topology) to the GMC measure Mγ (dθ) = e γX(e iθ )− γ 2 2 E[X(e iθ ) 2 ] dθ, where γ = √ 2s and X is the Gaussian free field E [X(x)X(y)] = − log |x − y|, x, y ∈ ∂D = unit circle, (1.4) as the size of the matrix N goes to infinity. When k < 1/s 2 = 2/γ 2 , the k-th moment of GMC exists and is given by the Fyodorov-Bouchaud formula [24,36] It follows from the work of Keating and Snaith [32] that (1.5) Combining both results, we arrive at which matches the first asymptotics in (1.1), i.e. the regime k < 1/s 2 , 0 < s < 1 is essentially resolved 3 . The focus of this article is to extend the analysis beyond the regime where GMC moments exist, particularly when k takes the critical value. In the CUE example this would correspond to the situation where kγ 2 /2 = ks 2 = 1 with s 2 < 1. A naive substitution of k = 1/s 2 into (1.6) results in the blow-up of the factor Γ(1 − ks 2 ), justifying the need for a different asymptotic formula in this setting. It is widely conjectured that but nothing is known rigorously except for k = 1/s 2 = 2 where it was shown in [14,Theorem 1.15] that In general, the best known result to date is which was established for positive integers k in [23], and to our knowledge there have been no conjectures giving an explicit formula for the O(1)-term. While one can no longer directly rely on the convergence of GMC in [35] to derive an asymptotic formula for MoM U(N) (k, s) in the regime where ks 2 ≥ 1, one may still consider the following GMC heuristic: if (Xǫ(·))ǫ>0 is a collection of continuous Gaussian fields such that (1.8) We shall therefore consider 'critical moments' of (regularised) subcritical GMCs (1.8) in this article, and study their asymptotics as ǫ → 0 + .

Setting and main result
For generality we consider d ≤ 2 so that our result is potentially relevant to not only one-dimensional models like the classical compact groups, but also two-dimensional ones such as complex Ginibre ensemble [48] as well as other asymptotically log-correlated structures in other probability problems. Let X be a log-correlated Gaussian field on some bounded open domain D ⊂ R d with covariance where f is some continuous function on D × D. For γ 2 < 2d, the Gaussian multiplicative chaos associated to X is defined as the weak * limit where (Xǫ) ǫ>0 is some sequence of continuous Gaussian fields on D that converges to X as ǫ → 0 + in a suitable sense. The covariance structure of such approximate fields are typically of the form and it is an established fact in the GMC literature that the limiting measure Mγ is independent of the choice of approximation (see Theorem 2.10 in Section 2.3 for a version for the mollification construction, or [27] for a generalisation). It is also well-known that the total mass of Mγ possesses moments of all orders strictly less than pc := 2d/γ 2 , and for convenience let us call the pair (pc, γ) critical-subcritical. In particular, if g ≥ 0 is a non-trivial 4 continuous function on D, then the sequence blows up as ǫ → 0 + by Fatou's lemma. Our first theorem provides a precise description of this behaviour when the underlying approximate fields Xǫ are constructed via mollification.
with Xǫ := X * νǫ, where X is the log-correlated Gaussian field in (1.9) and νǫ(·) := ǫ −d ν(·/ǫ) for some mollifier ν ∈ C ∞ c (R d ). 5 Suppose g ≥ 0 is a non-trivial continuous function on D. Writing pc = 2d/γ 2 , we have (1.12) The constant C γ,d is the reflection coefficient of GMC, which is explicitly given by (1.13) The expression (1.12) is reminiscent of the tail expansion of subcritical GMCs, as it was shown in [47] (under extra regularity condition on f ) that (1.14) This should not be entirely surprising given that the blow-up of the pc-th moment is intrinsically related to the power law asymptotics of GMCs, and these results together seem to suggest that (1.14) may also hold for the approximate GMC Mγ,ǫ for t = t(ǫ) tending to infinity not too quickly with respect to ǫ → 0 + . We also see the recurring theme of universality for these large-deviation type problems, i.e. the asymptotics only depends on the diagonal of the covariance f (u, u) but not any further specification of the underlying log-correlated field. The asymptotic formula (1.12) also looks relatively robust and should hold beyond the mollification setting, e.g. when Xǫ is constructed by truncating the series expansion of X. We have the following corollary.
Corollary 1.2. Under the same setting as Theorem 1.1, except that (Xǫ)ǫ>0 is any collection of continuous Gaussian fields satisfying the following two conditions: • There exists some C > 0 such that • For any δ > 0, lim Then the asymptotics in (1.12) remains true. 4 By which we mean´D g > 0. 5 By mollifiers we mean functions ν ∈ C ∞ c (R d ) such that ν ≥ 0 and´ν = 1.
On circular unitary ensemble. We are able to establish the following asymptotics for Haardistributed unitary matrices. Theorem 1.3. Let k ≥ 2 be any integer. As N → ∞, we have (1.15) Let us quickly explain how (1.15) follows from the GMC heuristics. Since the log-characteristic polynomial behaves asymptotically like a log-correlated Gaussian field in the entire mesoscopic regime, we anticipate that the GMC approximation would be accurate, and hence expect to hold for any k = 1/s 2 > 1. Based on the discussion in Section 1.2, it is reasonable to believe that the suitable log-correlated field for approximation is given by the Gaussian free field (1.16) in which case the f -function appearing in (1.9) is simply zero on the diagonal. Meanwhile, the comparison with γ = √ 2s seems to suggest the choice ǫ = ǫ(N ) = Ω(N −1 ).The asymptotics (1.15) then follows by combining all these considerations with Theorem 1.1. When k ≥ 2 is an integer, this heuristic argument can be made rigorous, since we may consider the expanded moments (1.2) and show that behave similarly as we integrate over [0, 2π) k . This will be verified carefully in Section 3.4. This strategy does not work for non-integers k unfortunately, but we nevertheless conjecture that Conjecture 1.4. The asymptotics (1.15) hold for any k > 1.

Interpretation of results: mesoscopic statistics
Our main result of course does not cover the asymptotics for E ´D g(x)Mγ,ǫ(dx) p in the 'supercritical' cases where • γ 2 > 2d and p = 2d γ 2 ; or • γ ∈ R but p > 2d γ 2 . We would like to emphasise that this restriction is not due to any technical reason, but rather is largely due to our motivation for (1.11) as an accurate toy model that provides the correct leading order.
As we shall see in the proofs, the GMC heuristic (1.8) is highly plausible in the critical-subcritical setting as long as the original model behaves asymptotically like a log-correlated Gaussian field in the entire mesoscopic scale. Going beyond this phase, the leading order will depend on the microscopic behaviour, and there is no hope to identify the correct leading coefficients even if our GMC toy model still successfully captures the correct order. In the context of CUE, the microscopic statistics is described by the Sine kernel, and it should not be surprising that the present Gaussian analysis does not provide the right answer to the leading order coefficients. The interested readers may find more discussion in Appendix C where we sketch some of the calculations in the supercritical case and explain the connections between the leading order asymptotics and the microscopic regime.
The same GMC heuristic may be applied to other models in probability, and we would like to mention two other examples in random matrix theory. Circular beta ensemble (CβE). For β > 0 and N ∈ N, the CβEN is a point process of N particles θ1, . . . , θN on the unit circle with joint density When β = 1, 2, 4 this exactly corresponds to the eigenvalue distributions of random matrices sampled from the circular orthogonal, unitary and symplectic ensembles respectively, but a general construction is also available for arbitrary β > 0 in [30]. If we define the 'characteristic polynomial' PN (θ) = N j=1 (1 − e i(θ j −θ) ) as before, then we know from [33] that in the sense of finite dimensional distribution as N → ∞, where X(·) is again the Gaussian free field on the unit circle in (1.4). Partial progress has been made towards proving the convergence to GMCs in the same article [33]; see also [15] where a different connection between CβE-characteristic polynomials and GMCs was established.
Not much is known about the asymptotics for moments of moments of CβE models in general, since the underlying point process is no longer determinantal for β = 2 and powerful techniques such as Riemann-Hilbert analysis no longer apply. The only work in this direction is the recent article [3] where the asymptotics are established for part of the supercritical regime β < 2ks 2 with integral parameters k and s, based on symmetric function theory.
We consider the 'critical-subcritical' regime based on the GMC heuristics and our main theorem. Let us recall from the work of Keating and Snaith [32] that Using the leading order asymptotics derived in [5,Lemma 4.17], we arrive at the following conjecture.
Remark 1.6. At first glance it is not obvious how (1.17) coincides with our earlier conjecture regarding the CUE when β = 2, but this may be deduced from the equivalent and simpler representation of Υ β (·) for rational values of β, see [5, Lemma 7.1].
Classical compact groups. Following the spirit of previous works on unitary matrices, the convergence of characteristic polynomials to GMC measures has recently been established in [26] for other classical compact groups, namely the orthogonal group O(N ) and symplectic group Sp(2N ) 6 . More precisely, if G(N ) = O(N ) or Sp(N ) (the latter case only makes sense when N is even), and where γ = √ 2s as before but now the Gaussian field X has covariance While this log-correlated field has an extra singularity at the complex conjugate and does not exactly satisfy (1.9), we believe that the GMC heuristic would work equally well here 7 , provided that the leading order is not dominated by the contribution from the neighbourhood of θ = 0 or θ = π where the behaviour of the field X is different.
To apply the GMC heuristic, we again extract the value of the f -function on the diagonal: as θ → φ, everywhere except for φ ∈ {0, π}, which is a set of measure zero and hence may be neglected, i.e. we may just take f (θ, θ) = log |1 − e 2iθ |. Unlike the unitary case, we also need to pick a suitable g-function for (1.12), since log |PN (θ)| converges to a non-stationary Gaussian field with non-zero mean (see [26,Theorem 3]). For orthogonal matrices, it is known (e.g. [26, equations (4.32)]) that and so we may want to pick g(θ) = N s 2 G(1+s) 2 G(1+2s) |1 − e 2iθ | −s 2 +s . Note that this is a continuous function without any singularity when s ≤ 1. Therefore, Conjecture 1.7. For any k > 1 and s = 1 √ k , For the symplectic case, we have (see e.g. [26, equations (4.18) for θ ∈ {0, π}. Naturally one would like to pick g(θ) = (2N ) s 2 G(1+s) 2 G(1+2s) |1 − e 2iθ | −s 2 −s , but unlike the orthogonal case the 'density function' now has singularities at θ = 0 or π. This violates our assumption on g in Theorem 1.1, but perhaps the GMC heuristic may be extended to the current case provided that the integral on the RHS of (1.12) remains finite, i.e. the leading order asymptotics is not dominated by local contributions near θ ∈ {0, π}, so that the non-uniformity of (1.19) or the different behaviour of the covariance kernel (1.18) near these singularities are irrelevant. Interestingly this gives rise to a 'singularity threshold' that is determined by the golden ratio: Remark 1.9. The leading order of the critical-subcritical moments of moments is conjectured to be Ω(N log N ) for the three classical compact groups U(N ), O(N ) or Sp(N ). This is in stark contrast to the results in [1] where the exponent of N depends on the underlying group, but is not surprising because of different singularity behaviour of the 'density function' g, which plays a role in determining the size of the leading order in these supercritical cases.
A note on Gaussian branching random walk. In [8], the asymptotics for moments of moments were considered for a toy model based on Gaussian branching random walk. For instance, if we take γ = √ 2β = 0, k = 1/β 2 ∈ N and ǫ = 2 −n , then [8, Theorem 2.1] says that The main reasoning is that blow-up of the 'critical moment' should be attributed to very positive covariance (i.e. near the diagonal), which is opposite to the 'anti-diagonal' behaviour, i.e. that the covariance tends to minus infinity when θ → −φ.
where Xǫ is a centred Gaussian field on D = (0, 1) with covariance E[ X 2 −n (x) X 2 −n (y)] = − log d(x, y) ∨ 2 −n and d(x, y) = 2 −m if the binary expansions of x and y only agree up to the first m digits. 8 Neither Theorem 1.1 nor Corollary 1.2 would apply to this field 9 since d(x, y) is not continuous with respect to the Euclidean distance (e.g. consider x < 1/2 < y with x, y arbitrarily close to 1/2) and − log d(x, y) is not of the form (1.9). A one-sided bound may still be obtained, however, if we use Theorem 1.1 with a different Gaussian field Xǫ = X * νǫ where Indeed, as |x − y| ≤ d(x, y), the comparison principle from Lemma 2.2 suggests that and this inequality is most likely to be a strict inequality.

Main idea
Despite the connections to large deviation of GMC, our proof is orthogonal to that in [47]. As we shall see later, the analysis here is closely related to the asymptotics for where (Bt) t≥0 is a standard Brownian motion and µ = γ 2 (pc − 1) > 0. This guiding toy model has been used previously in the literature, such as the renormalisability of Liouville CFT at the Seiberg bound [20] and fusion estimates for GMC [9], but these papers deal with negative moments of exponential functionals of Brownian motion with non-negative drift, the behaviours of which are drastically different from those in the current article.
Going back to (1.20), we note that´∞ 0 e γ(Bt−µt) dt is actually almost surely finite. To understand what causes the blow-up when we compute the (pc − 1)-th moment, we make use of Williams' path decomposition theorem (Theorem 2.9), which says that (Bt − µt) t≥0 • behaves like a Brownian motion with positive drift µ (which is denoted by (B µ s ) s≥0 ) until it hits a height M that is independently distributed according to Exp(2µ); • runs like M−B µ t−τ M when one goes beyond the first hitting time τM, where (B µ s ) s≥0 is a Brownian motion with drift µ conditioned to stay positive.
With these ideas in mind, we can rewrite our exponential functional as and attribute the blow-up behaviour to the presence of the term e γ(pc−1)M = e 2µM in the expectation. The use of Williams' result is inspired by an earlier paper [39] on the tail expansions of 2-dimensional GMC measures, but the analysis there concerns negative moments and is closer in spirit to our "continuity from below" discussion in Lemma 3.1 or Appendix B. Our main theorem, however, concerns positive moments which require a different treatment, and it is crucial to identify the range of M that contributes to the leading order of the exponential functional. Not surprisingly, this is given by {M ≤ µT } as supported by the intuition of law of large numbers.
Organisation of the paper. The remainder of this article is organised as follows.
• In Section 2, we compile a list of results that will be used in our proofs. This includes a collection of methods for the analysis of Gaussian processes, as well as various facts about path distribution of Brownian motions with drift.
• Our main proofs are presented in Section 3. We first study a slightly unrelated Lemma 3.1 to illustrate how the path decomposition results may be used in our analysis. We then go back to Theorem 1.1, showing how the general problem may be reduced to one concerning the exactly log-correlated field from which we obtain our main result. After establishing a technical estimate, we then extend our result to Corollary 1.2. The section concludes with the proof of Theorem 1.3 on CUE moments of moments.
• For completeness, we explain in the appendices the current status of the subcritical-subcritical asymptotics (1.6), and how the leading order coefficient in Theorem 1.1 may be seen as the renormalised limit of GMC moments from below. We also discuss the reason why GMC heuristics (1.8) for random matrix models should fail completely as one goes beyond the critical-subcritical regime, and comment briefly on the validity of the strong asymptotics for Toeplitz determinants with slowly merging singularities which is used in Section 3.4.

Gaussian processes
We start with our Gaussian toolbox. The first result we need is a standard change-of-measure lemma.
Lemma 2.2. Let ρ be a Radon measure on D and F : R+ → R some smooth function with at most polynomial growth at infinity, and suppose G0, G1 are two centred continuous Gaussian fields on D.
Then the derivative of ϕ is given by In particular, if then for any (integrable) convex function F and (non-negative) measure µ on D, we have A simple corollary to Lemma 2.2 is as follows: for some C > 0, then for any (non-negative) measure µ, we have 1], and the inequality is reversed for k ∈ [0, 1]. In particular, if the two-sided inequality The case where k ∈ [0, 1] is similar.

On Brownian motions and 3-dimensional Bessel processes
This subsection collects a few important results regarding the path distributions of Brownian motions as well as 3-dimensional Bessel processes starting from 0 (or BES0(3)-processes in short). We shall be using these notations throughout the entire paper.
• For µ ≥ 0: is a Brownian motion with drift µ conditioned to stay non-negative, and (B µ t ) t∈R a two-sided version of it, i.e. (B µ t ) t≥0 and (B µ −t ) t≥0 are two independent copies. To be more precise: is a Markov process starting from the origin with infinitesimal generator or equivalently solution to the singular stochastic differential equation where (Wt) t≥0 is the Wiener process. These definitions extend to the case µ = 0, where (B 0 t ) t≥0 should be interpreted as a BES0(3)-process (βt) t≥0 which solves (In particular, despite its colloquial name, the process (B µ t ) t≥0 and similarly its two-sided analogue have discontinuous drift at t = 0.) Note that (B µ t ) t≥0 is known under different names in the literature, such as the radial part of a 3-dimensional Brownian motion with drift µ ≥ 0. We prefer to call it a conditioned drifting Brownian motion because this is the more commonly known name in the GMC-related literature. Indeed (B µ t ) t≥0 can be constructed via such conditioning procedure in the sense of Doob's theory of h-transform, see e.g. [46,Section 2.4] and [37, Section 3] and the references therein.
Let us begin with an elementary lemma on stochastic dominance.
Proof. The stochastic order should immediately follow from a suitable conditioning construction, but we nevertheless check it by showing that the drift in (2.4) is monotonically increasing in µ: if we write for any fixed x > 0, then We now mention the generalised "2M-B" theorem due to Rogers and Pitman, which provides a duality between (B µ t ) t≥0 and (B µ t ) t≥0 . Theorem 2.6 (cf. [37,Theorem 1]). Let µ ≥ 0, and write Then we have where the µ = 0 case also holds if we interpret (B µ=0 t ) t≥0 as a BES0 (3)-process (βt) t≥0 accordingly.
A simple but useful corollary of the above result is the following.
Proof. Given (B µ t ) t≥0 , we define J µ t = inf s≥t B µ s and set Then according to Theorem 2.6, (B µ t ) t≥0 is a Brownian motion with drift µ, and J µ t = S µ t for any t > 0. We then observe that is a stopping time for (B µ t ) t≥0 . By the strong Markov property, the process The result follows from one final application of Theorem 2.6.
Note that Theorem 2.6 is intrinsically related to the work of Williams [46] (see Remarks in [37, Section 5] for a discussion of their connections), and we would like to highlight two important consequences that will play an indispensable role in our main proof. The first one concerns the time reversal of Brownian motion from its first hitting time.
Lemma 2.8 (cf. [37,Corollary 1]). Let µ > 0 and suppose τx = inf{t > 0 : B µ t = x} is the first hitting time of x > 0 by (B µ t ) t≥0 . Then where Lx = sup{t > 0 : B µ t = x} is the last hitting time of (B µ t ) t≥0 . The second one is the celebrated result of Williams on the path decomposition of Brownian motions with drift. 10 Theorem 2.9 (cf. [46,Theorem 2.2], [37,Corollary 2]). Let µ > 0 and consider the following independent objects: is a Brownian motion with drift µ conditioned to stay non-negative. Then the process (Rt) t≥0 defined by where τx = inf{t > 0 : B µ t = x} as usual, is a Brownian motion with drift −µ.

Gaussian multiplicative chaos
In this subsection we collect a few facts and fix some notations about multiplicative chaos. Before we begin, let us recall that X(·) is said to be a log-correlated Gaussian field on a bounded open domain where (for the purpose of this paper) the function f is continuous on D ×D. The logarithmic singularity on the diagonal of the covariance kernel means that X(·) is not defined pointwise and has to be interpreted as a generalised function: for any two test functions φ1, φ2 ∈ C ∞ c (D), the pair of random variables (formally written as) is jointly Gaussian with mean 0 and covariance given bŷ φ2(y)dxdy, and the set of test "functions" may be extended to the set of finite (signed) measures More concretely, the log-correlated field X(·) may be constructed via Karhunen-Loève expansion and viewed as an abstract random variable in the negative Sobolev space H −ǫ (R d ) for any ǫ > 0; we refer the interested readers to [28, Section 2.1] for a self-contained proof of the Sobolev regularity.
Given a log-correlated field X(·) on D and γ ∈ (0, √ 2d), the associated Gaussian multiplicative chaos (GMC) is formally defined as the random measure While the exponentiation of X(·) is not well-defined, the GMC measure may be constructed via various methods, such as martingale limits [29], smoothing procedures [38,4], and a generic approach based on the abstract language of Cameron-Martin space is also available [43]. The main takeaway is that all these constructions lead to the same random measure and we are free to choose the most convenient one to work with. The following formulation is based on [4].
Theorem 2.10. Let X(·) be a log-correlated Gaussian field on D ⊂ R d and Xǫ := X * νǫ where ν ∈ C ∞ c (D) is a mollifier. For any fixed γ ∈ (0, √ 2d), the sequence of random measures converges in probability, as ǫ → 0 + , to a non-trivial and non-atomic random measure Mγ (dx) in the space of Radon measures on D equipped with the weak * topology. Moreover, the limit Mγ (dx) is independent of the choice of ν ∈ C ∞ c (D). We will need to deal with GMC measures associated to exact fields and related objects; these are summarised in the definition below.
Definition 2.11. Let d ≤ 2. By exact field, we mean the log-correlated Gaussian field X defined on the (closed) unit ball with covariance The field X can be decomposed into two independent Gaussian components where (Bt) t≥0 is a standard Brownian motion. We also denote by 11 the GMC measure associated to X.
Remark 2.12. We elaborate on a few things in Definition 2.11 for those who may not be familiar with the theory of log-correlated fields and multiplicative chaos.
• The fact that the covariance kernel (2.9) of X can be split into two independent parts follows from the radial-lateral decomposition of Gaussian free field in d = 2 (the case d = 1 may be seen as a one-dimensional restriction). Indeed one can construct the Brownian motion B − log |x| by considering the circle averaging One can then define X(x) := X(x) − B − log |x| and verify (using rotational symmetry) that which implies independence due to Gaussianity.
• A priori, the field X (often known as the lateral part of X) is defined on B(0, 1) in the sense of a generalised Gaussian field, but its scale invariance (see Lemma 2.13 below) implies that it may be extended to the entire R 2 (on a suitable probability space). 12 Since B − log |·| has a continuous version (as a Brownian motion), X inherits the regularity of X and may be seen as an abstract random variable living in a (local) Sobolev space of negative exponent.
• Zγ (·) (and its formal expression) should be interpreted in the sense of Theorem 2.10, i.e. the weak * limit (in probability) of where Xǫ(·) := ( X * νǫ)(·), and again the limit is independent of the choice of mollifiers ν ∈ C ∞ c (R d ). Similar to X, the domain of Zγ (d·) can be extended to the entire real line R.
For future reference, we record the invariance property of X and Zγ in the following lemma.
Lemma 2.13. For any c > 0, we have Proof. The scale invariance of X follows from a simple covariance check that 13 As for the shift invariance of Zγ , let us start by considering Zγ,ǫ associated to Xǫ := X * νǫ for some by the scale invariance of X, we see that and sending ǫ → 0 + on both sides this implies Zγ • φc Let us also recall from [47, Appendix A] the various probabilistic representations of the reflection coefficient C γ,d .
Lemma 2.14. Let d ≤ 2, γ ∈ (0, √ 2d) and pc = 2d/γ 2 . The reflection coefficient C γ,d appearing in Theorem 1.1 is equal to dx is the Gaussian multiplicative chaos associated to the exact field X(·), and this limit is independent of the value of r ∈ (0, 1]. In particular, it has the alternative probabilistic representation . Remark 2.15. The fact that the limit (2.14) is independent of the value r > 0 is a consequence of the fact that E M γ (B(0, 1)) pc−1 < ∞; see [47,Theorem 1] for finer results regarding the tail asymptotics of subcritical GMCs.
As for the fact that the probabilistic representations in Lemma 2.14 coincide with the expressions in (1.13), this requires a long detour to the exact integrability of Liouville conformal field theory which is beyond the scope of this paper. The interested readers may consult [39, Section 1.1 and 4.2] and [47, Appendix A] for a brief discussion, and [31,36] for the derivations of these remarkable formulae.

Some estimates
In this subsection we collect some basic estimates. The first one concerns the construction of GMC measures via mollifications. Lemma 2.16. Using the notations in Theorem 2.10, the following are true: where sup ǫ>0 sup x,y∈D |fǫ(x, y)| < ∞.
, and consider a collection of continuous Gaussian fields 14 {Xǫ(·)}ǫ>0 on D with covariance of the form Then for any α ∈ [0, 2d γ 2 ) and κ ≥ 0, there exists some C > 0 possibly depending on C f , γ, α, κ but independent of ǫ, r such that the random measure Proof. Both of these estimates are standard in the literature but for completeness we give a sketch of the claim for α ∈ (0, 2d γ 2 ) (the case α = 0 is trivial). For convenience, the constant C > 0 below may vary from line to line but its uniformity should be evident from the proof.
Let us start with (2.16): writing where Mγ,ǫ(A(2 −n−1 r, 2 −n r)) :=ˆ2 For C > 0 sufficiently large (but independent of ǫ, r, n), we have where Xǫ = X * νǫ is a mollified exact field (see the first claim in Lemma 2.16). If we write M γ,ǫ as the GMC measure associated to Xǫ and also interpret − log(2 −n+1 r) as the variance of an independent Gaussian random variable Nn,r, then Corollary 2.3 implies that where the last inequality follows from the uniform integrability of M γ,ǫ(B(0, 1 2 )) α (see the second claim in Lemma 2.16). Substituting this back to (2.18), we obtain The same principle also shows that where n * = log⌊r/ǫ⌋.
Next we derive (2.17) using (2.16). For α ∈ (0, 1], we have which leads to the bound (2.17). As for α > 1 the inequality again follows from similar arguments applied to E ´ǫ ≤|x|≤r |x| −κγ Mγ,ǫ(dx) The next lemma is an elementary estimate that allows us to remove irrelevant mass from our asymptotic analysis in Section 3.2.1.
Lemma 2.18. Let k > 0, and A, B be two non-negative random variables with finite k-th moment.
(i) Suppose there exists some η > 0 such that Proof. We begin with the first statement. For k ≥ 1, one can easily check (say by computing the first derivative) that As for k ∈ (0, 1), Now for the second statement, we start with k ∈ (0, 1) where by our assumption. As for k ≥ 1, we have by an application of Hölder's inequality. This concludes the proof.

Main proofs
For convenience, we shall write p = pc = 2d/γ 2 throughout the entire Section 3.

Warm-up: a limit from below
As advertised in Section 1.5, the proof of Theorem 1.1 largely boils down to the analysis of positive moments of the exponential functional of Brownian motion with negative drift, with the help of Williams' path decomposition.
To offer a first glance at how some of the results in Section 2.2 may be applied, we make a digression and study the limit of a much simpler but related quantity. The following result, while not needed for the proof of Theorem 1.1, will be used when we sketch a renormalisation result in Appendix B.
The fact that (3.1) is independent of r ∈ (0, 1) follows from the same reasoning as in Remark 2.15. Indeed for 0 < α < p − 1 ≤ 1, we have where M γ is the GMC measure associated to X. The last term on the RHS can be bounded uniformly in α ≤ p − 1 by the existence of GMC moments (Lemma 2.16), and hence does not affect the limit in (3.1). The case p − 1 > 1 is similar. For this reason, we shall assume for simplicity that r = 1 in the proof below.
Proof. Starting with the radial decomposition of our exact field we consider the change of variable x = e −t for d = 1 and x = e −t e iθ for d = 2 and rewrite the α-th moment as . The next step is to re-express (Bt − µt) t≥0 using Lemma 2.8 and Theorem 2.9. If φc : x → x + c is the shift operator, we have where (B µ ±t ) t≥0 are two independent Brownian motions with drift µ conditioned to stay positive, M is an independent Exp(2µ) variable, and LM := sup{t > 0 : B µ −t = M}. Note that the removal of φM in the last equality above follows from the translation invariance of Zγ (dt), which is inherited from the scale invariance of X(x) (see Lemma 2.13). Now for any x > 0, we have It is enough to focus on the lower bound since along the way we will see that the upper bound is tight as α → (p − 1) − . By independence, we compute Substituting 2µ = γ(p − 1) back to our expression, we have lim inf As x > 0 is arbitrary, we let x → ∞ so that Lx → ∞ almost surely and the lower bound matches the upper bound. This concludes the proof.

Proof of Theorem 1.1
Recall that Xǫ = X * νǫ for some mollifier ν ∈ C ∞ c (R d ), and we shall assume without loss of generality that supp(ν) ⊂ B(0, 1 4 ). The proof of our main theorem consists of three parts: 1. We first remove part of the mass in Mγ,ǫ(D) which does not contribute to the blow-up.
2. We then argue that the problem may be reduced to one regarding the exact field (2.8).
3. Finally, we prove that the p-th moment, after suitable renormalisation, converges to the probabilistic representation (2.15) of the reflection coefficient of GMC as ǫ → 0 + .
According to Lemma 2.16, the covariance structure of Xǫ is of the form where sup ǫ>0 sup x,y∈D |fǫ(x, y)| < ∞. In the following we shall assume without loss of generality that f, fǫ ≥ 0 everywhere in the domain. Indeed: and since − log r + f (x, y) ≥ 0 uniformly for r sufficiently small, we can always rewrite our expression in terms of another log-correlated field (by scaling our coordinate with r) with an 'f '-function that satisfies the non-negativity condition, i.e. X (r) (·) := X(r·) on the new domain D/r with .
The same argument also applies to fǫ, and moreover we have The reduction to the case where f, fǫ ≥ 0 everywhere is only reasonable if Theorem 1.1 is invariant under the rescaling of space, which we quickly verify now. If we rewrite where gr(u) := r d g(ru) still satisfies the non-negativity and continuity assumptions, then Theorem 1.1 leads to the asymptotics (as ǫ → 0 + ) which is the desired result (since r > 0 is fixed and its appearance in the logarithm does not affect the leading order behaviour as ǫ → 0). For simplicity our proof below treats the case where g ≡ 1 on D, even though the analysis can easily be adapted to treat continuous g ≥ 0 (see Remark 3.5 before Section 3.3).

Pre-processing: removal of irrelevant mass
Let us recall that p = 2d/γ 2 > 1. Then where the last equality follows from Cameron-Martin theorem (see Lemma 2.1) and the fact that based on the expression (3.2). We shall therefore fix u ∈ D and consider the integrand To proceed, we use the fact (which is a consequence of Lemma 2.18) that we can remove any mass inside the integral in (3.3) provided that they do not contribute to the leading order blow-up. The part we want to remove is for some K > 1. For this we need to show that the (p − 1)-th moment remains bounded when ǫ → 0, and this is true because where the first inequality follows from the fact that fǫ may be uniformly bounded (see Lemma 2.16), and the second inequality from the first estimate in Lemma 2.17. Actually we will remove more mass from our analysis. Let r > 0 be some fixed number, then of course we have where the RHS stays bounded as ǫ → 0, by the convergence and existence of (p − 1)-th moment of Gaussian multiplicative chaos (see Lemma 2.16 and Lemma 2.17). Summarising the work so far, we have

Reduction to exact fields
The next step is to rewrite in terms of something more analytically tractable. We claim that Proof. Recall our earlier reduction to the case where f, fǫ ≥ 0 everywhere, and without loss of generality suppose r ≤ 1/4. We consider a log-correlated random field X (u) (·) on B(u, 4r) with covariance This field can be easily realised as the exact field in (2.8) up to a shift of domain, i.e. X (u) (·) := X(·−u).
Since f is continuous (and hence uniformly continuous) on the compact set D × D, it has a modulus of continuity, say ω(·). If Nu ∼ N (0, f (u, u)) is independent of everything else, then for all x, y ∈ D ∩ {v : Kǫ ≤ |v − u| ≤ r}, and this upper bound goes to 0 as ǫ, r → 0 + . Applying Corollary 2.3, we can perform a Gaussian comparison and obtain the estimate uniformly in u ∈ D, and so it suffices to consider the numerator We note the following observations: • Nu is independent of X (u) and its expectation may be evaluated directly.
• We may replace e γ 2 fǫ(x,u) by e γ 2 f (u,u) and pull it out of the expectation, up to some multiplicative error which vanishes as ǫ → 0, K → ∞ and r → 0. This is true because and for |x − u| ≥ Kǫ ≥ ǫ: where the last inequality follows from the assumption that supp(ν) ⊂ B(0, 1 4 ). Taking all these considerations together, (3.6) can be rewritten (up to a multiplicative error of CK,r(ǫ) as defined in the statement of the lemma) as

Convergence to the reflection coefficient
The final ingredient we need is the following asymptotic formula.
Proof of Theorem 1.1. Note that for any u ∈ D, we have the trivial uniform bound Combining this with Lemma 3.2, Lemma 3.3 and again Lemma 3.4, we see that both Since K, r > 0 are arbitrary, we send K → ∞ and r → 0, and conclude our proof by identifying as the probabilistic representation of C γ,d with Lemma 2.14.
The rest of the section is to prove Lemma 3.4, which is broken into several steps.
Step 1: simplifying the radial part. As in the proof of Lemma 3.1, we start by considering the radial decomposition of the exact field X and extend it to its regularised version, i.e.
Step 2: a uniform upper bound. We claim that the integrand in (3.9) is uniformly bounded in m and ǫ (for any fixed K > 1 and r ∈ (0, 1 4 )). It is more convenient to deal with the representation (3.10), and we will show that for some constants Cp ∈ (0, ∞) independent of m and ǫ. For future reference, we will actually show a slightly strengthened version of (3.11), namely: for any c ≥ 0, for some Cp(c) ∈ (0, ∞) independent of m and ǫ, such that Cp(c) is monotonically decreasing to 0 as c → ∞. There are two cases to consider: First observe the distributional equality Zγ,ǫ • φc(dt) d = Zγ,ǫec (dt) for any c ≥ 0 (possibly random but independent of X (see (2.12) and (2.13) in the proof of Lemma 2.13 for the details). Therefore, our integrand may be bounded by first rewriting it as and then performing Gaussian comparison in the conditional expectation. Since When c = 0, the expectation appearing in the last bound is the probabilistic representation of C γ,d (see Lemma 2.14) and is thus finite. If we take the above bound as the definition of Cp(c), then Cp(c) satisfies all the desired properties (uniformity in m, ǫ and monotone convergence to 0 as c → ∞) in our claim.
• p − 1 ∈ (0, 1), i.e. the map x → x p−1 is concave. By Jensen's inequality and subadditivity, we have Again we can take the above bound as the definition of Cp(c), which obviously satisfies all the desired properties except the claim that Cp(0) < ∞ which we verify now. By Lemma 2.5, (B µ t ) t≥0 stochastically dominates a BES0(3)-process, which we denote by (βt) t≥0 . In particular, using the fact that βt and so we are done.
To analyse the integrand further, we fix some small δ ∈ (0, 1/4) and proceed by splitting the mintegral into two regions.
Step 3: main contribution from m ≤ µ(1 − 2δ)T . Let us fix some m0 > 0 and ignore all the contributions to the integral from m < m0. We recall from Lemma 2.8 that the law of Lm is the same as that of τm := inf{t > 0 : B µ t = m} and so the event where the last inequality just follows from the truncation of the integral at a deterministic (but arbitrary) level M ≤ δT (which is allowed since T = − log Kǫ is arbitrarily large as ǫ → 0 + ). Since Lm is almost surely finite and only depends on (B µ t ) t∈R (and in particular independent of X), we have Z γ,ǫ exp(Lm) Since M > 0 is arbitrary, we obtain from (3.13) the lower bound (3.14) As for the upper bound, consider in the following two cases: • p − 1 ≥ 1. We can upper bound the (p − 1)-th root of (3.15) by • p ∈ (0, 1). We can upper bound (3.15) by Under either scenario, the first term provides the main contribution, and the same argument from the lower bound leads to On the other hand, the uniform bound from Step 2 gives us a control on the residual term: where Cp(·) ∈ (0, ∞) is defined in (3.12) with the property that Cp(M ) → 0 as M → ∞. Since M > 0 is arbitrary, we arrive at the upper bound Combining (3.14) and (3.16), we have The lower bound then matches the upper bound as we send m0 to infinity, and this coincides with µ(1 − 2δ)C γ,d by (2.15) in Lemma 2.14.
Step 4: remaining contribution. We can use the crude bound for the integrand from Step 2 so that 1 Tˆµ where Cp is defined in (3.11). It suffices to control the remaining contribution from m ≥ µ(1 + 4δ)T . We consider a different event here, namely On this event, it is actually more convenient to work with the original representation of the integrand, i.e. (3.9). Since τm > τ (1−δ)m > T and max t≤T B µ t ≤ (1 − δ)m, we see that where the second inequality follows from time reversal (Lemma 2.8), and the constant Cp < ∞ in the last inequality is again the bound in (3.11).
On the complementary event, we have where we abused the notation in the first inequality and considered the dual interpretation A c δ = L (1−δ)m ≤ (1 + δ)T when we applied time reversal. Using the fact that P(N (0, 1) > t) ≤ e −t 2 /2 and m ≥ µ(1 + 4δ)T , we have by independence (as A c δ only depends on (B µ t ) t≤0 ). As for the first term, pick α > 1 small enough so that 1 ≤ α(p − 1) < p, and we obtain by Hölder's inequality and Gaussian comparison (to replace Zγ,ǫ • φL m with Zγ up to a multiplicative error C > 0). Our task now is to show that (3.17) For this, consider where the last equality follows from Corollary 2.7. To analyse the expectation, we first split it into by translation invariance (Lemma 2.13), and this quantity is finite due to the existence of GMC moments. Meanwhile, from Theorem 2.6, we have Applying Hölder's inequality, (3.19) can be further bounded by by the existence of GMC moments again. Therefore, (3.18) and hence (3.17) are all finite, as claimed.
Gathering all terms, we see that there exists some constant C > 0 independent of T (or equivalently ǫ) and m ≥ µ(1 + 4δ)T (but possibly depends on other parameters) such that and hencê for δ > 0 sufficiently small.
Step 5: conclusion. Summarising all the work we have done so far, we have where T = − log Kǫ. Dividing both sides by − log ǫ and taking the limit ǫ → 0 + , the RHS becomes 1 + O(K −1 ) 2µ 2 C γ,d + O(δ). Since K > 1 and δ > 0 are arbitrary, we send K to infinity and δ to zero and obtain the desired result. When r is small, g(x) ≈ g(u) for |x − u| ≤ r, and so the integral above can be further approximated by and no further changes are needed for the rest of the analysis. This explains the appearance of g(u) p in the leading order coefficient in Theorem 1.1.
Proof. To simplify the notations let us verify the inequality for x0 = 0, assuming that the domain contains the origin and that diam(D) := sup{|x − y| : x, y ∈ D} ≤ 1. The constant C > 0 below will vary from line to line but its uniformity in ǫ, δ and x0 ∈ D should be evident from the proof.
To begin with, consider (3.20) Before we continue with our bounds, let us mention that the uniformity of C > 0 over the class of fields Xǫ stated in the lemma is a simple consequence of the fact that the expectation that appears in the integrand in (3.20) can be bounded uniformly in this class via Gaussian comparison (see Corollary 2.3). Now, let us split the v-integral in (3.20) into two parts, namely when |u − v| ≤ 2ǫ or 2ǫ ≤ |u − v| ≤ r. We proceed by considering two separate cases.
• Case 1: p ≥ 2. For the first part where |u − v| ≤ 2ǫ, we havê where the second last inequality follows from Lemma 2.17. As for the part where |u − v| ∈ [2ǫ, r], we first focus on the expected value in (3.20) and split it into two terms, depending on whether x is closer to u or v. By symmetry, it suffices to deal with again by Lemma 2.17. Therefore, which implies the desired inequality.
• Case 2: p < 2. For the first part where |u − v| ≤ 2ǫ, we compare the Gaussian field Xǫ with a Gaussian random variable Nǫ with variance − log ǫ on |x − u| ≤ ǫ, and obtain by Corollary 2.3. This can be uniformly bounded in u ∈ D by The second part may be treated in a similar way. If |x − u| ≤ 1 2 |u − v|, it is easy to check by triangle inequality that 1 2 for any x, y ∈ B(u, c) and hencê Arguing as before (and combined with the convergence and existence of GMC moments, see Lemma 2.16) we can obtain a bound for the last expectation that is uniform for all u, v ∈ D and ǫ > 0. Hence the above expression is bounded by Cr d log r ǫ and the proof is complete.
Proof of Corollary 1.2. Again for simplicity we just treat the case g ≡ 1 here. Let us construct on the same probability space a Gaussian field X with covariance (1.9), independent of {Xǫ}ǫ, and write Xǫ(x) = X * νǫ for some mollifier ν ∈ C ∞ c (R d ). We further introduce the notations for all t ∈ [0, 1]. Using the interpolation formula from Lemma 2.2, we have Now fix δ > 0 and write r = ǫ 1−δ . By assumption, we have for ǫ > 0 sufficiently small Together with the other assumption on the covariance of Xǫ, we can bound (3.22) by (3.23) Using Gaussian comparison, E M t γ,ǫ (D) p is, up to a multiplicative factor, upper bounded by E M 1 γ,ǫ (D) p . It then follows from Theorem 1.1 that the first term in (3.23) is of order O(δ log 1 ǫ ). As for the second term, we can bound it by where Dr is a collection of points x ∈ D such that D ⊂ x∈D B(x, r). We can of course choose Dr such that its cardinality is at most O(r −d ). Combining this with Lemma 3.6 (which holds uniformly in t ∈ [0, 1]) we see that the second term is also O(δ log 1 ǫ ), and therefore lim sup Since δ > 0 is arbitrary, we let δ → 0 + and conclude that E M 1 γ,ǫ (D) p has the same leading order asymptotics as E M 0 γ,ǫ (D) p , which is described by Theorem 1.1.

On circular unitary ensemble: proof of Theorem 1.3
The k = 2 case was first proved in [14,Theorem 1.15], relying on the full Painlevé V description of the strong asymptotics of Toeplitz determinant with 2 merging singularities, but it is not clear how this may be extended to deal with k > 2 singularities with arbitrary merging behaviour. We shall show, however, that the asymptotics for critical-subcritical moments can be derived using only the following ingredients.
Lemma 3.7. The following estimates holds for any integers k ≥ 1.
Remark 3.8. The asymptotics in Lemma 3.7 (i) was originally stated for fixed singularities θ1, . . . , θ k ∈ [0, 2π) in [17], but can be shown to hold in the current setting. We explain this briefly in Appendix D.
(In this proof we use C > 0 to denote some constant which varies from line to line below but is uniform in every relevant parameter.) In particular, (3.25) is of order O(δ1 log 1 ǫ ), and and we have verified (3.24) successfully.
If k ≥ 3 we continue to 'bring down the powers': Let us now pick δ2 > 0 such that 2δ2 < δ1, and consider For ǫ > 0 sufficiently small, we may assume without loss of generality that ǫ 1−δ 1 ≥ 10ǫ 1−2δ 2 so that where the second and third inequalities follow from the multifractal estimates (Lemma 2.17) again. In particular, and using the basic estimate from Lemma 2.18 we havê for some suitable c(δ2; ǫ). This procedure can be repeated until we bring down all the powers, leading to the identity for some suitable c(δ1, . . . , δ k−1 ; ǫ) and this necessarily implies the validity of (3.24). Since all these δj's are arbitrary, we conclude that as ǫ → 0 + and our proof is complete.

A Circular unitary ensemble and subcritical GMCs
Earlier in the introduction, we mentioned that the asymptotics (1.6) for MoM U(N) (k, s) is essentially resolved in the subcritical regime where k < 1/s 2 and 0 < s < 1. By essentially resolved we meant that the GMC argument could be made rigorous provided that one could establish the convergence of moments where γ = √ 2s and Mγ (dθ) = e γX(e iθ )− γ 2 2 E[X(e iθ ) 2 ] dθ is the GMC measure associated to the Gaussian free field (1.4). In many situations, the condition (A.1) can be verified. For instance, if k ∈ N, we can consider the expanded moments (1.2) and show that the main contribution to the multiple integral comes from the set where the angles θ1, . . . , θ k are at a macroscopic distance apart from each other, in which case existing asymptotics for Toeplitz determinant with singularities may be applied. This was done as part of [23, Corollary 2.1].
For non-integer values of k, the expanded moment approach no longer works. One may want to rely on the weak convergence but this distributional result per se does not immediately imply (A.1). Fortunately, the proof of (A.2) is based on a moment method [35,45], and stronger conclusions may be obtained. For example, in the L 2 -regime [45] where s 2 < 1/2 , there exists a further sequence of random measures µ Using these ingredients, we may conclude that for any 1 ≤ k ≤ 2, Mγ (dθ) k and the case where k ∈ (0, 1) is similar. This approach is generalised in [35] to the L 1 -regime where s 2 ∈ [1/2, 1), from which one could obtain (A.1) at least for all k ≤ 1. Of course, these results have not covered the entire subcritical regime k < 1/s 2 in the asymptotic analysis of moments of moments (1.1) of CUE matrices, but we expect that more refined analysis along a similar line of argument may make this possible and that the problem is conceptually solved.

B GMC moments: renormalisation from below
Here we briefly mention the 'lower continuity' of GMC moments up to renormalisation at critical exponent in the following sense.
By considerations similar to those in Section 3.2.1, we have Given the uniform continuity of f on D × D, it has a modulus of continuity ω(·) so that |f (x, u) − f (u, u)| ≤ ω(r) ∀x ∈ B(u, r) ∀u ∈ D and ω(r) → 0 as r → 0 + . Based on the same argument of Gaussian comparison we saw in Lemma 3.3, (B.1) can be rewritten as Our claim follows immediately by first applying Lemma 3.1 and then sending r to 0.

C Supercritical moments of approximate GMCs
Let us again consider Xǫ := X * νǫ for some mollifier ν ∈ C ∞ c (R d ) and study the integrand on the RHS of Suppose we are in the 'supercritical' case where p > 2d/γ 2 for any γ ∈ R. Looking at (part of) the microscopic contribution, one can show that by Lemma 2.18. Since δ > 0 is arbitrary, a finer analysis will arrive at the conclusion that the leading coefficient depends only on the microscopic regime where |x − u| = O(ǫ).
In particular, the Gaussian field Xǫ(s; u) := Xǫ(u + ǫs; u) does not have log-singularity anymore as ǫ → 0 + , and with a simple change of variable we see that (C.1) is asymptotically equal to The leading order coefficient remains Ω(1) as ǫ → 0 + , but it is very sensitive to the choice of approximate fields Xǫ (or equivalently the mollifiers ν). In addition, the Gaussian quantity here is not able to accurately capture the properties of e.g. random matrix models, the microscopic statistics of which are described by the Sine process instead. Let us also quickly comment on the 'critical-supercritical' case, i.e. p = 2d/γ 2 ∈ (0, 1). If we look at the toy model of exponential functional of Brownian motion, i.e.

E
ˆT 0 e γ(Bt−µt) dt p−1 we now have a Brownian motion with drift −µ = − γ 2 (p − 1) > 0, so that the leading order contribution to the above integral should come from near T = −(1 + o(1)) log ǫ. In terms of the GMC asymptotics, it again suggests that the leading order depends heavily on the microscopic region.

D Slowly merging Fisher-Hartwig singularities
We now discuss why the strong asymptotics in Lemma 3.7 (i) continue to hold when we allow singularities θ1, . . . , θ k to possibly merge at a a rate not faster than N −(1−δ) . Since this requires an inspection of the Riemann-Hilbert analysis that is largely independent of the main text of the current article, we shall follow the notations in [16,17] as closely as possible in this appendix unless otherwise specified. Much of the following content is rather standard in this kind of study, and we shall only briefly explain the arguments with heavy reference to existing literature where interested readers can find more details.
To begin with, it is helpful to have a quick recap of the proof of the strong asymptotics of Toeplitz determinant with Fisher-Hartwig singularities. 16  Furthermore, the Toeplitz determinants are connected to orthogonal polynomials on the unit circle in the following way: there exists a dual pair of orthogonal polynomials 18 φ k (z) = χ k z k + . . . and φ k (z) = χ k z k + . . .
of degree k with positive leading coefficient χ k for each k ∈ N such that (writing z = e iθ ) 1 2πˆ2 π 0 φ k (z)z −j f (z)dθ = χ −1 k δ jk , 1 2πˆ2 π 0 φ k (z)z j f (z)dθ = χ −1 k δ jk , j = 0, 1, . . . , k and it can be shown that (see e.g. [17,Section 2]) The Riemann-Hilbert analysis kicks in when we try to encode these objects in some matrix-valued functions: for each k ∈ N, one can define are the regularised versions. 19 In particular, the RHS of (D.1) only depends on the 'last two' RHPs Y (n) and Y (n+1) . Provided that we can establish their asymptotics (including any derivatives that have appeared in the differential identity) as n → ∞, we can obtain our desired formula by integrating both sides with respect to each variable γ = αj , j = 1, . . . , m, using the fact that Dn(0, . . . , 0) = 1.
So far everything we have discussed applies regardless of the behaviour of the singularities, and it all boils down to the analysis of the asymptotics for Y -RHP. This relies on the method of non-linear steepest descent, first applied in the random matrix context in [18], which involves the following steps: • The Y -RHP is transformed to the so-called T -RHP which 'normalises the behaviour at infinity', see e.g. [17,Equation (4.1)].
• The T -RHP is further transformed to a new S-RHP, a procedure known as opening of lenses, see e.g. [17,Equation (4.2)] and the discussion that follows it.
• The S-RHP is approximated by two sets of functions: -the global parametrix N (z) in [17, Equations (4.7-4.10)] when z is away from the unit circle and not close to any of the singularities; -a collection of functions Pz j (z), known as the the local parametrices (see [17,Equation (4.16)] and other related definitions), each of which is defined on a small neighbourhood Uz j of zj = e iθ j , say Uz j := {z ∈ C : |z − zj| < ǫ} for some ǫ > 0 chosen such that Uz j 's are non-overlapping.
• The R-RHP [17,Equation (4.26)], which measures the 'multiplicative error' when the S-RHP is approximated by the global/local parametrices, is shown to be close to the identity matrix with other 'nice properties' as n → ∞.
When the singularities z1, . . . , zm are fixed, the parameter ǫ > 0 in the definition of Uz j may be chosen as some fixed number, and one can easily verify the 'jump condition' for the R-RHP in [17,] is of the form R+(z) = R−(z)∆(z) with ∆(z) = I + O(n −1 ). Standard analysis then shows that R has a unique solution and that R − I has an asymptotic expansion in powers of n −1 , see more details in [17,Section 4.1].
When the singularities are allowed to vary but remain n −(1−κ) (with κ > 0) separated from each other 20 , the first three steps of the steepest descent will still go through except that the local parametrices Pz j will now be defined on Uz j with e.g. radius ǫ = ǫn = 1 10 n −(1−κ) in order to satisfy the non-overlapping condition. The analysis of R-RHP with shrinking contours already appeared in [16,17] for the 'extension to smooth V (z)', even though in the context of merging singularities they were perhaps first analysed in [14] and more recently in [12,23] The consequence of shrinking contours (in the notations of [16,17]) are as follows: 19 See the discussion between Equations (3.16) and (3.17) in [17]. 20 The separation distance n −(1−κ) here should be seen as our N −(1−δ) in the main text of the current paper.
• The jump conditions [17,] remain true, but now the jump matrix ∆(z) := I + N (z) 0 0 f (z) −1 z ∓n 0 N (z) −1 (where the exponent for z is negative if z ∈ Σ out j and positive if z ∈ Σ ′′ out j for j = 1, . . . , m) is no longer I + O(e −cn ) for some small c > 0, but rather I + O(e −cn c ) where c > 0 may possibly depend on κ but can be made uniform in compact sets of α1, . . . , αm.
While R(z) − I can no longer be expanded in an asymptotic series in powers of n −1 , the R-RHP is still a 'small-norm problem' and so for n sufficiently large it may still be solved using the Neumann series construction. 21 In particular, one can still verify that R(z) = I + o (1) and ∂ ∂γ R(z) = o (1) and so all the leading order asymptotics for χn, Y (n) and Y (n+1) and its regularised versions (as well as their derivatives) appearing in (D.1) will remain identical to what was observed in the fixed singularity case, and hence the strong asymptotics for Toeplitz determinants with singularities may be extended to cover the slightly more general situation with a separation distance of n −(1−κ) for any κ > 0.