Fast mean-reversion asymptotics for large portfolios of stochastic volatility models

We consider an asymptotic SPDE description of a large portfolio model where the underlying asset prices evolve according to certain stochastic volatility models with default upon hitting a lower barrier. The asset prices and their volatilities are correlated through systemic Brownian motions, and the SPDE is obtained on the positive half-space along with a Dirichlet boundary condition. We study the convergence of the loss from the system, which is given in terms of the total mass of a solution to our stochastic initial-boundary value problem, under fast mean-reversion of the volatility. We consider two cases. In the first case, the volatilities are sped up towards a limiting distribution and the system converges only in a weak sense. On the other hand, when only the mean-reversion coefficients of the volatilities are allowed to grow large, we see a stronger form of convergence of the system to its limit. Our results show that in a fast mean-reverting volatility environment, we can accurately estimate the distribution of the loss from a large portfolio by using an approximate constant volatility model which is easier to handle.


Introduction
In this paper, our aim is to investigate the fast mean-reverting volatility asymptotics for an SPDE-based structural model for portfolio credit. SPDEs arising from large portfolio limits of collections of defaultable constant volatility models were initially studied in Bush et al. [5], and their regularity was further investigated in Ledger [21]. In Hambly and Kolliopoulos [15][16][17], we extended this work to a two-dimensional stochastic volatility setting, and here we consider the question of effective onedimensional constant volatility approximations which arise by considering fast meanreversion in the volatilities. This approach is to some extent motivated by the ideas analysed in Fouque et al. [11,, but instead of option prices, we look at the systemic risk of large credit portfolios in the fast mean-reverting volatility setting.
The literature on large portfolio limit models in credit can be divided into two approaches based on either structural or reduced form models for the individual assets. Our focus will be on the structural approach, where we assume that we are modelling the financial health of the firms directly and default occurs when these health processes hit a lower barrier.
The reduced form setting assumes that the default of each firm occurs as a Poisson process and we model the default intensities directly. These can be correlated through systemic factors and through the losses from the portfolio. The evolution of the large portfolio limit of the empirical measure of the loss can be analysed as a law of large numbers and then Gaussian fluctuations derived around this limit; see Giesecke et al. [13,24,25,12] and Cvitanić et al. [6]. Further, the large deviations can be analysed; see Sowers and Spiliopoulos [26,27]. It is also possible to take an approach through interacting particle systems where each firm is in one of two states representing financial health and financial distress and there is a movement between states according to some intensity, often firm dependent, and dependent on the proportion of losses; see for instance Dai Pra and Tolotti [8] or Dai Pra et al. [7].
Our underlying set up is a structural model for default in which each asset has a distance to default, which we think of as the logarithmically scaled asset price process. The asset price evolves according to a general stochastic volatility model, in which the distance to default of the ith asset X i satisfies the system for all i ∈ N. The coefficient vectors C i = (r i , ρ 1,i , ρ 2,i , k i , θ i , ξ i ) are picked randomly and independently from some probability distribution with ρ 1,i , ρ 2,i ∈ [0, 1), the infinite sequence ((x 1 , σ 1,init ), (x 2 , σ 2,init ), . . . ) of random vectors in R 2 is assumed to be exchangeable (that is, the joint distribution of every finite subset is invariant under permutations), and g, h are functions for which we give suitable conditions later. The exchangeability condition implies (by de Finetti's theorem, see Kotelenez and Kurtz [19,Theorem 4.1] and Bernardo and Smith [1,Chap. 4] for a proof) the existence of a σ -algebra G ⊆ σ ({(x i , σ i ) : i ∈ N}), given which the two-dimensional random vectors (x i , σ i ) are pairwise independent and identically distributed. The idiosyncratic Brownian motions W i , B i for i ∈ N are taken to be pairwise independent, and also independent of the systemic Brownian motions W 0 , B 0 which have a constant correlation ρ 3 . We regard this as a system for Z i = (X i , σ i ) with . Then the infinitesimal generator of the above twodimensional process is given by for f ∈ C 2 (R + × R; R). The matrix A i = (a i jk ) is given by 2 h(σ )ξ i g(σ )ρ 1,i ρ 2,i ρ 3 h(σ )ξ i g(σ )ρ 1,i ρ 2,i ρ 3 ξ 2 i g(σ ) 2 , as A i = i R( i ) with R the covariance matrix for the 4-dimensional Brownian motion W i . We can show that the empirical measure of a sequence of finite sub-systems, converges weakly as N → ∞ (see [17]) to the probability distribution of Z 1 t given W 0 , B 0 and G. This measure consists of two parts: its restriction to the line x = 0 which is approximated by the restriction of ν N to this line, and its restriction to R + × R which possesses a two-dimensional density u(t, x, y). The density u(t, x, y) can be regarded as an average of solutions to certain two-dimensional SPDEs with a Dirichlet boundary condition on the line x = 0. In particular, we can write u = E[u C 1 | W 0 , B 0 , G], where u C 1 (t, x, y) is the probability density of Z 1 t given W 0 , B 0 , G and C 1 on R + × R, which satisfies, for any value of the coefficient vector C 1 , the two-dimensional SPDE du C 1 = A 1, * u C 1 dt + B 1, * u C 1 d(W 0 , B 0 ) , (1.2) where A 1, * is the adjoint of the generator A 1 of Z 1 and the operator B 1, * is given by The boundary condition is that u C 1 (t, 0, y) = 0 for all y ∈ R. In the special case where the coefficients are constants independent of i, u is itself a solution to the stochastic partial differential equation (1.2). One reason for studying the large portfolio limit is the need to have a useful approximation which captures the dynamics among the asset prices when the number of assets is large. Moreover, by studying the limit SPDE instead of a finite sub-system of (1.1), we can potentially provide a more efficient approach to capturing the key drivers of a large portfolio without having to simulate a large number of idiosyncratic Brownian paths.
Of central importance will be the loss process L, the value of which at each time t > 0 is given by i.e., the mass on the line x = 0 of the probability distribution of Z 1 t given W 0 , B 0 and G. This quantity is approximated, as N → ∞, by the mass of v N t on the line x = 0, which is equal to the proportion of defaulted assets at time t in the finite subsystem of size N , and thus it measures the total loss in the large portfolio limit at time t. The distribution of L is a simple measure of risk for the portfolio of assets and can be used to find the probability of a large loss, or to determine the prices of portfolio credit derivatives such as CDOs that can be written as expectations of suitable functions of L. Thus our focus will be on estimating probabilities of the form for some 0 ≤ a < b ≤ 1, that is, the probability that the total loss from the portfolio lies within a certain range. Probabilities of the above form can be approximated numerically with a simulated sample of values of L t , obtained via after solving the SPDE (1.2) for u C 1 numerically, for a sample {c 1,1 , . . . , c 1,n } of values of the vector C 1 . In the special case when asset prices are modelled as simple constant volatility models, the numerics (see Giles and Reisinger [14] or Bujok and Reisinger [4] for jump-diffusion models) have a significantly smaller computational cost, which motivates the investigation of the existence of accurate approximations using a constant volatility setting in the general case. We also note that one-dimensional SPDEs describing large portfolio limits in constant volatility envi-ronments have been found to have a unique regular solution (see Bush et al. [5] or Hambly and Ledger [18] for a loss-dependent correlation model), an important component of the numerical analysis and a counterpoint to the fact that we have been unable to establish uniqueness of solutions to the two-dimensional SPDE arising in the CIR volatility case [15]. We derive our one-dimensional approximations under two different settings with fast mean-reverting volatility. In what we call the large vol-of-vol setting, the meanreversion and volatility in the second equation in (1.1) are scaled by suitable powers of in that k i = κ i / and and then we take → 0. This is distributionally equivalent to speeding up the volatility processes by scaling the time t by , when is small. Our aim is to take the limit as → 0, so that when the system of volatility processes is positive recurrent, averages over finite time intervals involving the sped-up volatility processes will approximate the corresponding stationary means. In the limit, we obtain a constant volatility large portfolio model which could be used as an effective approximation when volatilities are fast mean-reverting. However, this speeding up does not lead to strong convergence of the volatility processes, allowing only weak convergence of our system, which can only be established when ρ 3 = 0 (effectively separating the time scales) and when The case of small vol-of-vol has the mean-reversion in the second equation in (1.1) scaled by in that k i = κ i / and We regard this case as a small noise perturbation of the constant volatility model, where volatilities have stochastic behaviour but are pulled towards their mean as soon as they move away from it due to a large mean-reverting drift. When → 0, the drifts of the volatilities tend to infinity and dominate the corresponding diffusion parts since the vol-of-vols remain small, allowing the whole system to converge to a constant volatility setting in a strong sense. This strong convergence allows the rate of convergence of probabilities of the form (1.3) to be estimated and gives us a quantitative measure of the loss in accuracy in the estimation of these probabilities when a constant volatility large portfolio model is used to replace a more realistic stochastic volatility perturbation of that model. In Sects. 2 and 3, we present our main results for both settings. The results are then proved in Sects. 4 and 5. Finally, the proofs of two propositions showing the positive recurrence, and hence the applicability of our results, for two classes of models can be found in the Appendix.

The main results: large vol-of-vol setting
We begin with the study of the fast mean-reversion/large vol-of-vol setting, for which we need to assume that the correlation ρ 3 of W 0 and B 0 is zero. When g is either the square root function or a function behaving almost like a positive constant for large values of the argument, it has been proved in [15,Theorem 4.3] and in [17,Theorem 4.1], respectively, that where p t is the density of each volatility path when the path of B 0 is given, and where u 0 is the density of each x i given G. In the above expression for the twodimensional density u C 1 (t, x, y), averaging happens with respect to the idiosyncratic noises, and since we are interested in probabilities concerning L t which is computed by substituting that density in (1.4), averaging happens with respect to the market noise (W 0 , B 0 ) as well. Therefore, we can replace (W i , B i ) for all i ≥ 0 in our system by objects having the same joint law. In particular, setting k i = κ i / and ξ i = v i / √ , the ith asset's distance to default X i, satisfies the system where the superscripts are used to underline the dependence on . If we substitute t = t and s = s for 0 ≤ s ≤ t and then replace ( for all i ≥ 0 which have the same joint law, the SDE satisfied by the ith volatility process becomes This shows that σ i, = σ i, · can be replaced by σ i,1 · for all i ≥ 1, i.e., the ith volatility process of our model when the mean-reversion coefficient and the vol-of-vol are equal to κ i and v i , respectively, and when the time t is scaled by , speeding up the system of the volatilities when is small. If g is now chosen so that the system of volatility processes becomes positive recurrent, averages over finite time intervals converge to the corresponding stationary means as the speed tends to infinity, i.e., as → 0+, which is the key for the convergence of our system. We give a definition of the required property for g. Definition 2. 1 We fix the distribution from which each C i = (r i , ρ 1,i , ρ 2,i , κ i , θ i , v i ) is chosen and denote by C the σ -algebra generated by all these coefficient vectors. Then we say that g has the positive recurrence property when the two-dimensional process (σ i,1 · , σ j,1 · ) is a positive recurrent diffusion for any two i, j ∈ N, for almost all values of C i and C j . This means that given C, there exists a two-dimensional random variable (σ i,j,1, * , σ i,j,2, * ) whose distribution is stationary for (σ i,1 · , σ j,1 · ), and whenever E[|F (σ i,j,1, * , σ i,j,2, * )| | C] exists and is finite for some measurable function F : R 2 → R, we also have or equivalently, after a change of variables, for any t ≥ 0, P-almost surely.
The positive recurrence property is a prerequisite for our convergence results to hold, and now we state two propositions which give us a few classes of models for which this property is satisfied. The first shows that for the Ornstein-Uhlenbeck model (g(x) = 1 for all x ∈ R), we always have the positive recurrence property. The second shows that for the CIR model (g(x) = √ |x| for all x ∈ R), we have the positive recurrence property provided that the random coefficients of the volatilities satisfy certain conditions. The proofs of both propositions can be found in the Appendix.

Proposition 2.2
Suppose that g is a differentiable function, bounded from below by some c g > 0. Suppose also that g (x)κ i (θ i − x) < κ i g(x) + v i 2 g (x)g 2 (x) for all x ∈ R and i ∈ N, for all possible values of C i . Then g has the positive recurrence property.
where the functiong is a continuously differentiable, strictly positive and increasing function taking values in [c g , 1] for some c g > 0. Then there exists an η > 0 such that g has the positive recurrence property when for all i, j ∈ N, P-almost surely.
which is a deterministic vector in R 4 , the function h is bounded, and g has the positive recurrence property, in which case we have . Consider now the one-dimensional large portfolio model where the distance to default X i, * of the ith asset evolves in time according to the system

Remark 2.5
Since all volatility processes have the same stationary distribution, a simple application of the Cauchy-Schwarz inequality shows thatσ ≤ σ 2,1 , which implies The above theorem gives only weak convergence and only under the restrictive assumption of having the same coefficients in each volatility. For this reason, we also study the asymptotic behaviour of our system from a different perspective. In particular, we fix the volatility path σ 1,1 and the coefficient vectors C i and study the convergence of the solution u (t, x) to the SPDE (2.1) in the sped-up setting, i.e., which is used to compute the loss L t . We now write E σ,C to denote the expectation given the volatility path σ 1,1 and the C i which we have fixed, and L 2 σ,C to denote the corresponding L 2 -norms. By part 2 of Theorem 4.1 in [15], the solution u to the above SPDE satisfies the identity which shows that the L 2 (R + )-norm of u (t, ·) and the L 2 ([0, T ] × R + )-norm of u are both uniformly bounded by a random variable which has a finite L 2 σ,C ( )-norm (the initial data assumptions made in [15] are also needed for this to hold), for all 0 ≤ t ≤ T . Therefore, since L 2 -spaces are reflexive by Brézis [3,Theorem 4.10], [3, Theorem 3.18] tells us that for a given sequence of values of tending to zero, we can always find a subsequence ( n ) n∈N and an element u The characterisation of the weak limits u * is given in the following theorem.

Theorem 2.6
Suppose that g has the positive recurrence property and that |h(x)| ≤ C for all x ∈ R, for some C > 0. Then whenever we have that u n → u * weakly in has a unique solution. In that case, since ( n ) n∈N can be taken to be a subsequence of an arbitrary sequence of values of tending to zero, we have that u converges weakly in It is not hard to see that the limiting SPDE (2.4) obtained in Theorem 2.6 corresponds to a constant volatility large portfolio model like the one given in Theorem 2.4 under the assumption that 1 . This indicates that the convergence of the loss L t can only be established in a weak sense, as in general we will haveσ > σ 1,1 and thusρ 1,i > ρ 1,i for all i. This is stated explicitly in the next proposition and its corollary.

The main results: small vol-of-vol setting
We now proceed to the small vol-of-vol setting, where only the volatility drifts are scaled by , i.e., k i = κ i / for all i. This leads to the model where the ith asset's distance to default satisfies The main feature of the above model is that when the random coefficients and the function g satisfy certain conditions, the ith volatility process σ i, converges in a strong sense to the C-measurable mean θ i as → 0+ for all i ∈ N, and we can also determine the rate of convergence. The required conditions are the following, and they are assumed to hold throughout the rest of this section: 1) The i.i.d. random variables σ i , ξ i , θ i , κ i take values in some compact subinterval of R, with each κ i being bounded from below by some deterministic constant c κ > 0.
3) Both the function h and its derivative have polynomial growth. Under the above conditions, the convergence of each volatility process to its mean is given in the following proposition.
The reason for having only weak convergence of our system in the large vol-of-vol setting was the fact that the limiting quantities σ 1,1 , σ 2,1 andσ did not coincide. On the other hand, Proposition 3.1 implies that the corresponding limits in the small volof-vol setting are equal, allowing us to hope for our system to converge in a stronger sense.
Let u be the solution to the SPDE (2.1) in the small vol-of-vol setting, where we have fixed the volatility paths and the random coefficients. Working as in the proof of Theorem 2.3, it is possible to establish similar asymptotic properties for the SPDE (3.1) as → 0+. However, we are going to work with the antiderivative v 0, defined by v 0, (t, x) = +∞ x u (t, y) dy for all t, x ≥ 0 which satisfies the same SPDE but with different initial and boundary conditions. This is more convenient since the loss given W 0 , B 0 and G (that is, the average over all possible volatility paths and coefficient values), while the convergence of v 0, can be established in a much stronger sense and without the need to assume that W 0 and B 0 are uncorrelated. Our main result is stated below. Bush et al. [5] and Ledger [21]), which arises from the constant volatility model The SPDE (3.2) corresponds to the model (3.3) in the sense that given the loss L t , the mass 1 − L t of non-defaulted assets equals In order to estimate the rate of convergence of probabilities of the form (1.3), we consider the approximation error for x ∈ [0, 1] and determine its order of convergence.  We now write Y i, for the ith asset's distance to default in the sped-up volatility setting when the stopping condition at zero is ignored, that is,

Corollary 3.3 For any
Next, for each i, we write Y i, * for the process X i, * when the stopping condition at zero is ignored, that is,  in distribution as → 0+ (since the probability that any of the m minima equals zero is zero, as the minimum of any Gaussian process is always continuously distributed, while Y i, is obviously Gaussian for any given path of σ i,1 ). Let for some s 1 , s 2 ∈ [0, t] and 1 ≤ i 1 , i 2 ≤ m, and without loss of generality, we may assume that the difference inside the last absolute value is nonnegative. Moreover, we have and thus min 1≤i≤m Clearly, max 1≤i≤m p i (·(t)) defined on C([0, t]; R m ) is also continuous (as the maximum of finitely many evaluation functionals). Thus, our problem is finally In order to show the convergence in distribution, we first establish that a limit in distribution exists as → 0+ by using a tightness argument, and then we characterise the limits of the finite-dimensional distributions. To show tightness of the laws of (Y 1, , Y 2, , . . . , Y m, ) for ∈ R + , which implies the desired convergence in distribution, we recall a special case of Ethier and Kurtz [9, Theorem 3.7.2] for continuous processes, according to which it suffices to prove that for a given η > 0, there exist some δ > 0 and N > 0 such that for all > 0. (4.4) can easily be achieved for some very large N > 0, since x 2 , . . . , x m ), which is independent of and almost surely finite (the sum over n ∈ N of the probabilities that the norm of this vector belongs to [n, n + 1] is a convergent series and thus the same sum but for n ≥ N tends to zero as N tends to infinity). For (4.5), observe that | · | R m can be any of the standard equivalent L p -norms on R m , and we choose it to be L ∞ . Then we have P sup The first of the last two probabilities is clearly zero for δ < η 2(r+M) , while the second one can also be made arbitrarily small for small enough δ since by a well-known result about the modulus of continuity of Brownian motion (see Mörters and Peres [22, Theorem 1.14]), the supremum within that probability converges almost surely (and so also in probability) to 0 as fast as M 2δ ln 1 M 2 δ . Using these in (4.6), we deduce that (4.5) is also satisfied and we have the desired tightness result, which implies that (Y 1, · , . . . , Y m, · ) converges in distribution to some limit (Y 1,0 · , . . . , Y m,0 · ) along some subsequence. To conclude our proof, we need to show that (Y 1,0 , . . . , Y m,0 ) coincides with (Y 1, * , . . . , Y m, * ). But both m-dimensional processes are uniquely determined by their finite-dimensional distributions, and evaluation functionals on C([0, t]; R m ) preserve convergence in distribution (as continuous linear functionals). So we only need to show that for any fixed (i 1 , . . . , i ) ∈ {1, . . . , m} , any fixed (t 1 , . . . , t ) ∈ (0, +∞) and any fixed continuous bounded q : R → R, for an arbitrary ∈ N, we have as → 0+. By dominated convergence, this follows if we can show that ) and C, we only need to show that as → 0+, the mean vector and the covariance matrix of (Y i 1 , t 1 , . . . , Y i , t ) converge to the mean vector and the covariance matrix of (Y i 1 , * t 1 , . . . , Y i , * t ), respectively. Given (σ i 1 ,1 · , . . . , σ i ,1 · ), C and a k ∈ {1, 2, . . . , }, the kth coordinate of the mean vector ) ds, and by the positive recurrence property, this converges to 2 )t k as → 0+ (since the volatility processes all have the same coefficients and thus the same stationary distributions), which is the kth coordinate of the mean vector of (Y i 1 This means that for i p = i q = i ∈ {1, 2, . . . , m}, we need to show that as → 0+, while for i p = i q , we need to show that as → 0+, whereρ 1,i σ 2,1 = ρ 1,iσ for all i ≤ m. Both results follow from the positive recurrence property forσ = E[h(σ i p ,i q ,1, * )h(σ i p ,i q ,2, * )], which does not depend on i p and i q since the volatility processes all have the same coefficients and thus the same joint stationary distributions. This concludes the proof.
Proof of Theorem 2.6 Let V be the set of F W 0 -adapted, square-integrable semimartingales on [0, T ]. Thus for any (V t ) 0≤t≤T ∈ V, there exist two F W 0 -adapted and square-integrable processes (V 1,t ) 0≤t≤T and (V 2,t ) 0≤t≤T such that for 0 ≤ t ≤ T . The processes of the above form for which (v 1,t ) 0≤t≤T and (v 2,t ) 0≤t≤T are simple processes, that is, for all 0 ≤ t ≤ T and i ∈ {1, 2}, with each F i being F W 0 t 1 -measurable, span a linear subspaceṼ which is dense in V for the L 2 -norm. By using the boundedness of h and then the estimate (2.3), for any p > 0 and any T > 0, we obtain It follows that any sequence n → 0+ always has a subsequence ( k n ) n∈N such that h p (σ 1,1 · )u kn (·, ·) converges weakly to some u p (·, ·) in L 2 σ,C ([0, T ] × R + × ) for p ∈ {1, 2}. Testing (2.2) against an arbitrary smooth and compactly supported function f : R + → R, using Itô's formula for the product of R + u (·, x)f (x) dx with a process V ∈Ṽ having the form (4.7), (4.8) and finally taking expectations, we find for all t ≤ T . Upon setting = k n and taking n → ∞, the weak convergence results above yield for all 0 ≤ t ≤ T . The convergence of the terms on the right-hand side of (4.10) holds pointwise in t, while the term on the left-hand side converges weakly. Since we can easily find uniform bounds for all terms in (4.10) (by using (4.9)), dominated convergence implies that all the weak limits coincide with the corresponding pointwise limits, which gives (4.11) as a limit of (4.10) both weakly and pointwise in t. It is clear then that Next, we can check that the expectation for both i = 1 and i = 2, both weakly and pointwise in t ∈ [0, T ], while the limits are also differentiable in t everywhere except in the two jump points t 1 and t 2 . This follows because everything is zero outside [t 1 , t 2 ], while both v 1 and v 2 are constant in t and thus of the form (4.7), (4.8) if we restrict to that interval. Subtracting from each term of (4.10) the same term but with u replaced by u * and then adding it back, we can rewrite this identity as Then we have which tends to zero (when = k n and n → ∞) by dominated convergence, since the quantity inside the last integral converges to zero pointwise and can be dominated by using (4.9). The same argument is used to show that the fourth and sixth terms in (4.12) also tend to zero along the same subsequence. Finally, for any term of the form for p, m ∈ {0, 1, 2}, we recall the differentiability of the second factor inside the dsintegral (which was mentioned earlier) and then integrate by parts to write it as which converges by the positive recurrence property to the quantity Using integration by parts once more, this last expression is equal to This last convergence result also holds if we replace V by v 1 or v 2 , as we can show by following exactly the same steps in the subinterval [t 1 , t 2 ] (where v i is supported for i ∈ {1, 2} and where we have differentiability that allows integration by parts).
If we now set = k n in (4.12), take n → ∞ and substitute all the above convergence results, we obtain SinceṼ is dense in V, for a fixed t ≤ T , we can have (4.13) for any square-integrable martingale (V s ) 0≤s≤t , for which we have v 1,s = 0 for all 0 ≤ s ≤ t. Next, we denote by R u (t, x) the right-hand side of (2.4). Using then Itô's formula for the product of dx from both sides, taking expectations and finally substituting from (4.13), we find that for our fixed t ≤ T . Using the martingale representation theorem, V s can be taken and this implies V t = I E t , allowing us to write for any 0 ≤ t ≤ T . If we integrate the above over t ∈ [0, T ], we obtain that where the quantity inside the expectation is always nonnegative and becomes zero only when I E t = 0. This implies R + R u (t, x)f (x) dx ≤ R + u * (t, x)f (x) dx almost everywhere, and working in the same way with the indicator of the complement I E c t , we can deduce the opposite inequality as well. Thus we must have x)f (x) dx almost everywhere, and since the function f is an arbitrary smooth function with compact support, we can deduce that R u coincides with u * almost everywhere, which gives (2.4).
If h is bounded from below, we can use (2.3) to obtain a uniform (independent of ) bound for the H 1 0 (R + ) ⊗ L 2 σ,C ( × [0, T ])-norm of u n , which implies that along a further subsequence, the weak convergence to u * also holds in that Sobolev space, in which (2.4) has a unique solution; see [5]. The proof is now complete.

Proof of Proposition 2.7
The upper bound can be obtained by a simple Cauchy-Schwarz inequality, writing This calculation shows that this bound is only attainable when σ i,j,1, * = σ i,j,2, * for all i and j with i = j , and this happens only when all the assets share a common stochastic volatility (i.e., ρ 2 = 1). For the lower bound, considering our volatility processes for i = 1 and i = 2 started from their one-dimensional stationary distributions independently, we have for any t, ≥ 0 that since σ 1,1 and σ 2,1 are identically distributed, and also independent when B 0 is given. Taking → 0+ in (4.14) and using the positive recurrence property, the definition ofσ and dominated convergence on the left-hand side (since the quantity inside the expectation there is bounded by the square of an upper bound of h), we obtain the lower bound, i.e.,σ ≥ σ 1,1 , which can also be shown to be unattainable in general. Indeed, if we choose h such that its compositionh with the square function is strictly increasing and convex, and if g is chosen to be a square-root function (thus we are in the CIR volatility case), for any α > 0, we have Let σ ρ be the solution to the SDE Then σ ρ can be shown to be the square root of a CIR process having the same meanreversion and vol-of-vol as σ 1,1 and a different stationary mean, and which satisfies the Feller condition for not hitting zero at a finite time. If we have σ ρ t 1 > σ B 0 t 1 for some t 1 > 0, we consider t 0 = sup{s ≤ t 1 : σ ρ s = σ B 0 s } which is obviously nonnegative.
By the positive recurrence of σ ρ (which is the root of a CIR process, the ergodicity of which can be deduced from Proposition 2.3), the right-hand side of the above converges to α 2 P[σ ρ, * ≥h −1 (α + σ 1,1 )] as → 0+, where σ ρ, * has the stationary distribution of σ ρ . This expression can only be zero when σ ρ, * is a constant, and since the square of σ ρ satisfies Feller's boundary condition, this can only happen when ρ 2 = 0. In that case, we can easily check that σ 1,2,1, * and σ 1,2,1, * are independent, which implies thatσ = σ 1,1 . This completes the proof.
Proof of Corollary 2.8 Let us suppose that P[X 1, in probability, under the assumptions of both Theorems 2.4 and 2.6. The same convergence then holds in a strong L 2 -sense for some sequence n ↓ 0, since it will hold P-a.s. for some sequence and then we can apply dominated convergence. Therefore, the same convergence must hold weakly in L 2 as well. However, assuming for simplicity that (r i , ρ 1,i ) is also a constant vector (r, ρ 1 ) for all i and fixing a sufficiently integrable and σ (W 0 · , B 0 · ) ∩ G-measurable random variable , by Theorem 2.6, we have where for each i we define with ρ 1 = ρ 1 σ 1,1 σ 2,1 , in which the density of X i,w t given W 0 and G is the unique solution u * to (2.4); see [18]. Therefore, by the uniqueness of a weak limit, we must have , G] P-almost surely, which cannot be true for any interval I as otherwise the processes X 1,w · and X 1, * · would coincide, which is clearly not the case here. Indeed, this can only be true whenρ 1,1 = ρ 1 , which is equivalent toσ = σ 1,1 , and by Proposition 2.7, this is generally not the case unless ρ 2 = 0.

Proofs: small vol-of-vol setting
We now proceed to the proofs of Proposition 3.1, Theorem 3.2 and Corollary 3.3, the main results of Sect. 3.
Proof of Proposition 3.1 First, we show that each volatility process has a finite 2p-moment for any p ∈ N. Indeed, we fix a p ∈ N and consider the sequence of stopping times (τ n, ) n∈N , where τ n, = inf{t ≥ 0 : σ i, t > n}. With σ i,n, t = σ i, t∧τ n, , Itô's formula gives Now using Itô's formula for the L 2 -norm (see Krylov and Rozovskii [20]), given the volatility path and C, we obtain where N(t, ) is some noise due to the correlation between B 0 and W 0 , satisfying E[N(t, )] = 0. In particular, since we could have written W 0 = 1 − ρ 2 3 V 0 + ρ 3 B 0 for some Brownian motion V 0 independent from B 0 , we have Next, we can apply part 2 of Theorem 4.1 in [15] to the SPDE (3.1) to find v 0, for all s ≥ 0. Using this expression, we obtain the estimate σ,C ( ×R + ) ds, (5.2) and in the same way σ,C ( ×R + ) ds (5.4) for some η > 0. Moreover, we have the estimate σ,C ( ×R + ) ds, (5.5) and by using v 0, and M, m > 0 are constants independent of the fixed volatility path. Taking expectations in (5.7) to average over all volatility paths, we find that and using Gronwall's inequality on the above, we finally obtain for some σ 1, s, * lying between θ 1 and σ 1, s , with for some λ 1 , λ 2 > 0 and some m ∈ N, which allows us to bound the right-hand side of (5.8) by a linear combination of terms of the form σ 1, which are all O( ) as → 0+ by Proposition 3.1. The proof of the theorem is now complete.

Proof of Corollary 3.3 Let
Next, for any η > 0, we have and if S denotes the σ -algebra generated by the volatility paths, since X 1, * t is independent of S and the path of B 0 , using the Cauchy-Schwarz inequality gives where the last equality above follows by using Morrey's inequality in dimension 1 (see Evans [ Therefore (5.10) gives for any η > 0, and in a similar way we can obtain . Using these two expressions in (5.9) and taking η = p for some p > 0, we finally obtain which becomes optimal as → 0+ when 1 − 2p = p, i.e., for p = 1 3 . This gives Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. 4) There exist z ∈ R n and r 0 > 0 such that ∞ r 0 e −I z,r 0 (r) dr = ∞, ∞ r 0 1 α z (r) e I z,r 0 (r) dr < ∞.
We now proceed to our proofs, where we establish positive recurrence results by showing that the above conditions are satisfied.
Proof of Proposition 2.2 It suffices to show that the two-dimensional continuous Markov process (σ 1,1 , σ 2,1 ) is positive recurrent. For this, set H i (x) = x 0 1 v i g(y) dy which is a strictly increasing bijection from R to itself, and then Z i = H i (σ i,1 ), for i ∈ {1, 2}. It suffices to show that the two-dimensional process Z = (Z 1 , Z 2 ) is positive recurrent. The infinitesimal generator L Z of Z maps any smooth function F : , with V i being a continuous and strictly decreasing bijection from R to itself for i ∈ {1, 2}.
We can compute and also B(s, (x, y)) = 1 and for all (x, y), (z, w) ∈ R 2 . Since the coefficients of L Z are continuous, with the higher order ones being constant, we can easily verify conditions 1) and 2). Moreover, since B and C (z,w) are constant in t and continuous in (x, y) while A (z,w) is bounded below by 1 2 (1 − λ) > 0, it follows that we have 3) as well.
Next, we choose z and w to be the unique roots of V 1 (x) and V 2 (y), respectively, and we have α (z,w) (r) = inf Therefore, all the required conditions are satisfied for the process Z = (Z 1 , Z 2 ), which means that (Z 1 , Z 2 ) is a positive recurrent diffusion, and thus (σ 1,1 , σ 2,1 ) is positive recurrent as well.
Proof of Proposition 2. 3 We first show that each volatility process never hits zero. which tends to −∞ as n → ∞. This shows that our volatility processes remain positive forever. As our volatility processes are strictly positive, we can set Z i = ln σ i,1 for i ∈ {1, 2}, and we need to show that (Z 1 , Z 2 ) is a positive recurrent diffusion. Again, we can easily determine the infinitesimal generator L Z of Z = (Z 1 , Z 2 ), which this time maps any smooth function F : R 2 → R to L Z F (x, y) = V 1 (x)F x (x, y) + V 2 (y)F y (x, y) for λ = ρ 2,1 ρ 2,2 < 1 and V i (x) = e −x (κ i θ i − v 2 i 2g 2 (e x )) − κ i for i ∈ {1, 2}, which are again two continuous and strictly decreasing bijections from R to itself. This can be shown by using the fact thatg is increasing and bounded above by 1.
Using the inequality ab ≥ − a 2 +b 2 2 , we obtain where we have used the two-dimensional mean-value theorem for each of the three terms, and the fact that all of the functions involved have a bounded gradient in [−N, N] 2 (sinceg has continuous derivatives). Thus if we take δ N (r) = C N r, we have 2) as well.