Fast mean-reversion asymptotics for large portfolios of stochastic volatility models

We consider an SPDE description of a large portfolio limit model where the underlying asset prices evolve according to certain stochastic volatility models with default upon hitting a lower barrier. The asset prices and their volatilities are correlated via systemic Brownian motions, and the resulting SPDE is defined on the positive half-space with Dirichlet boundary conditions. We study the convergence of the loss from the system, a function of the total mass of a solution to this stochastic initial-boundary value problem under fast mean reversion of the volatility. We consider two cases. In the first case the volatility converges to a limiting distribution and the convergence of the system is in the sense of weak convergence. On the other hand, when only the mean reversion of the volatility goes to infinity we see a stronger form of convergence of the system to its limit. Our results show that in a fast mean-reverting volatility environment we can accurately estimate the distribution of the loss from a large portfolio by using an approximate constant volatility model which is easier to handle.


Introduction
In this paper our aim is to investigate the fast mean reverting volatility asymptotics for an SPDE-based structural model for portfolio credit. SPDEs arising from large portfolio limits of collections of defaultable constant volatility models were initially studied in Bush, Hambly et al. [5], and their regularity was further investigated in Ledger [24]. In Hambly and Kolliopoulos [15,16,17] we extended this work to a two-dimensional stochastic volatility setting, and here we consider the question of effective one-dimensional constant volatility approximations which arise by considering fast mean-reversion in the volatilities. This approach is to some extent motivated by the ideas of Fouque, Papanicolaou and Sircar [11], but instead of option prices we look at the systemic risk of large credit portfolios in the fast mean-reverting volatility setting.
The literature on large portfolio limit models in credit can be divided into two approaches based on either structural or reduced form models for the individual assets. Our focus will be on the structural approach where we assume that we are modelling the financial health of the firms directly and default occurs when these health processes hit a lower barrier.
The reduced form setting assumes that the default of each firm occurs as a Poisson process and we model the default intensities directly. These can be correlated through systemic factors and through the losses from the portfolio. The evolution of the large portfolio limit of the empirical measure of the loss can be analysed as a law of large numbers and then Gaussian fluctuations derived around this limit, see Giesecke, Sirignano et al. [13,27,29,12] and Cvitanic et al. [6]. Further, the large deviations can be analysed, see Sowers and Spiliopoulos [30,31]. It is also possible to take an approach through interacting particle systems where each firm as in one of two states representing financial health and financial distress and there is a movement between states according to some intensity, often firm dependent, and dependent on the proportion of losses, see for instance Dai Pra and Tolotti [8] or Dai Pra et al. [7].
Our underlying set up is a structural model for default in which each asset has a distance to default, which we think of as the logarithmically scaled asset price process. The asset price evolves according to a general stochastic volatility model, in which the distance to default of the i-th asset X i satisfies the system (1.1) for all i ∈ N, where the coefficient vectors C i = (r i , ρ 1,i , ρ 2,i , k i , θ i , ξ i ) are picked randomly and independently from some probability distribution with ρ 1,i , ρ 2,i ∈ [0, 1), the infinite sequence {(x 1 , σ 1,init ), (x 2 , σ 2,init ), ...} of random vectors in R 2 is assumed to be exchangeable, and g, h are functions for which we will give suitable conditions later. The exchangeablity condition implies (see [1,20]) the existence of a σ-algebra G ⊂ σ({(x i , σ i ) : i ∈ N}), given which the two-dimensional random vectors (x i , σ i ) are pairwise independent and identically distributed. The idiosyncratic Brownian motions W i , B i for i ∈ N are taken to be pairwise independent, and also independent of the systemic Brownian motions W 0 , B 0 which have a constant correlation ρ 3 . We regard this as a system for Z i = (X i , σ i ) with Then, the infinitesimal generator of the above twodimensional process is given by a i jk ∂ 2 f ∂x j ∂x k for f ∈ C 2 (R + × R, R). The matrix A i = a i jk is given by h(σ)ξ i g(σ)ρ 1,i ρ 2,i ρ 3 h(σ)ξ i g(σ)ρ 1,i ρ 2,i ρ 3 ξ 2 i g(σ) 2 , as A i = Σ i R(Σ i ) ⊤ , with R the covariance matrix for the 4-dimensional Brownian motion W i . We can show that the empirical measure of a sequence of finite sub-systems converges weakly as N → ∞ (see [17]) to the probability distribution of Z 1 t given W 0 , B 0 and G. This measure consists of two parts; its restriction to the line x = 0, which is approximated by the restriction of ν N to this line, and its restriction to R + × R which possesses a two-dimensional density u(t, x, y). The density u(t, x, y) can be regarded as an average of solutions to certain two-dimensional SPDEs with a Dirichlet boundary condition on the line x = 0. In particular, we can write u = E[u C 1 | W 0 , B 0 , G], where u C 1 (t, x, y) is the probability density of Z 1 t given W 0 , B 0 , G and C 1 on R + × R, which satisfies, for any value of the coefficient vector C 1 , the two-dimensional SPDE where A 1, * is the adjoint of the generator A 1 of Z 1 , and the operator B 1, * is given by B 1, * f = −ρ 1,1 h(y) ∂f ∂x , −ξ 1 ρ 2,1 g (y) ∂f ∂y .
The boundary condition is that u C 1 (t, 0, y) = 0 for all y ∈ R. In the special case where the coefficients are constants independent of i, u is itself a solution to the stochastic partial differential equation (1.2). One reason for studying the large portfolio limit is the need to have a useful approximation which captures the dynamics among the asset prices when the number of assets is large. Moreover, by studying the limit SPDE instead of a finite sub-system of (1.1), we can potentially provide a more efficient approach to capturing the key drivers of a large portfolio without having to simulate a large number of idiosyncratic Brownian paths.
Of central importance will be the loss function L, the mass of the probability distribution of Z 1 t given W 0 , B 0 and G on the line x = 0, which measures the total loss in the large portfolio limit. The distribution of this function is a simple measure of risk for the portfolio of assets and can be used to find the probability of a large loss, or to determine the prices of portfolio credit derivatives such as CDOs that can be written as expectations of suitable functions of L. Thus our focus will be on estimating probabilities of the form for some 0 ≤ a < b ≤ 1, that is the probability that the total loss from the portfolio lies within a certain range. Probabilities of the above form can be approximated numerically with a simulated sample of values of L t , obtained via after solving the SPDE (1.2) for u C 1 numerically, for a sample {c 1,1 , ..., c 1,n } of values of the vector C 1 . In the special case when asset prices are modelled as simple constant volatility models, the numerics (see Giles and Reisinger [14], or Bujok and Reisinger [4] for jump-diffusion models) have a significantly smaller computational cost, which motivates the investigation of the existence of accurate approximations using a constant volatility setting in the general case. We note also that one-dimensional SPDEs describing large portfolio limits in constant volatility environments have been found to have a unique regular solution (see [5], or Hambly and Ledger [18] for a loss-dependent correlation model), an important component of the numerical analysis and a counterpoint to the fact that we have been unable to establish uniqueness of solutions to the two-dimensional SPDE arising in the CIR volatility case [15].
We will derive our one-dimensional approximations under two different settings with fast mean-reverting volatility. In what we call the large vol-of-vol setting, the mean reversion and volatility in the second equation in (1.1) are scaled by suitable powers of ǫ in that k i = κ i /ǫ and and then we take ǫ → 0. This is distributionally equivalent to speeding up the volatility processes by scaling the time t by ǫ, when ǫ is small. Our aim is to take the limit as ǫ → 0, so that when the system of volatility processes is positive reccurent, averages over finite time intervals involving the sped up volatility processes will approximate the corresponding stationary means. In the limit we obtain a constant volatility large portfolio model which could be used as an effective approximation when volatilities are fast meanreverting. However, this speeding up does not lead to strong convergence of the volatility processes, allowing only for weak convergence of our system, which can only be established when ρ 3 = 0 (effectively separating the time scales) and when (κ i , θ i , v i , ρ 2,i ) is the same constant vector (κ, θ, v, ρ 2 ) for all i ∈ N.
The case of small vol-of-vol has the mean reversion in the second equation in (1.1) scaled by ǫ in that k i = κ i /ǫ and We regard this case as a small noise perturbation of the constant volatility model, where volatilities have stochastic behaviour but are pulled towards their mean as soon as they move away from it due to a large mean-reverting drift. When ǫ → 0, the drifts of the volatilities tend to infinity and dominate the corresponding diffusion parts since the volof-vols remain small, allowing for the whole system to converge to a constant volatility setting in a strong sense. This strong convergence allows the rate of convergence of probabilities of the form (1.3) to be estimated and gives us a quantitative measure of the loss in accuracy in the estimation of these probabilities when a constant volatility large portfolio model is used to replace a more realistic stochastic volatility perturbation of that model. In Sections 2 and 3 we present our main results for both settings. The results are then proved in Sections 4 and 5. Finally the proofs of two propositions showing the positive recurrence, and hence applicability of our results, for two classes of models can be found in the Appendix.

The main results: large vol-of-vol setting
We begin with the study of the fast mean-reversion -large vol-of-vol setting, for which we need to assume that the correlation ρ 3 of W 0 and B 0 is zero. When g is either the square root function or a function behaving almost like a positive constant for large values of the argument, it has been proven in Theorem 4.3 in [15] and in Theorem 4.1 in [17] respectively that where p t is the density of each volatility path when the path of B 0 and the information in G are given, and u(t, x, W 0 , G, where u 0 is the density of each x i given G. In the above expression for the two-dimensional density u C 1 (t, x, y), averaging happens with respect to the idiosyncratic noises, and since we are interested in probabilities concerning L t which is computed by substituting that density in (1.4), averaging happens with respect to the market noise (W 0 , B 0 ) as well. Therefore, we can replace (W i , B i ) for all i ≥ 0 in our system by objects having the same joint law. In particular, setting k i = κ i /ǫ and ξ i = v i / √ ǫ, the i-th asset's distance to default X i,ǫ satisfies the system where the ǫ superscripts are used to underline the dependence on ǫ, and if we substitute t = ǫt ′ and s = ǫs ′ for 0 ≤ s ′ ≤ t ′ and then replace ( ) for all i ≥ 0 which have the same joint law, the SDE satisfied by the i-th volatility process becomes This shows that σ i,ǫ = σ i,ǫ ǫ× · ǫ can be replaced by σ 1,1 · ǫ for all i ≥ 1, i.e the i-th volatility process of our model when the mean-reversion coefficient and the vol-of-vol are equal to κ i and v i respectively and when the time t is scaled by ǫ, speeding up the system of the volatilities when ǫ is small.
If g is now chosen so that the system of volatility processes becomes positive recurrent, averages over finite time intervals converge to the corresponding stationary means as the speed tends to infinity, i.e as ǫ → 0 + , which is the key for the convergence of our system. We give a definition of the required property for g. Definition 2.1 (Positive recurrence property). We fix the distribution from which each is chosen, and we denote by C the σ-algebra generated by all these coefficient vectors. Then, we say that g has the positive recurrence property when the two-dimensional process (σ i,1 · , σ j,1 · ) is a positive recurrent diffusion for any two i, j ∈ N, for almost all values of C ′ i and C ′ j . This means that given C, there exists a two-dimensional random variable (σ i,j,1, * , σ i,j,2, * ) whose distribution is stationary for (σ i,1 · , σ j,1 · ), and whenever E[|F (σ i,j,1, * , σ i,j,2, * )| | C] exists and is finite for some measurable function F : R 2 → R we also have: or equivalently, after a change of variables, for any t ≥ 0, P-almost surely.
The positive recurrence property is a prerequisite for our convergence results to hold, and now we will state two propositions which give us a few classes of models for which this property is satisfied. The first shows that for the Ornstein-Uhlenbeck model (g(x) = 1 for all x ∈ R) we always have the positive recurrence property. The second shows that for the CIR model (g(x) = |x| for all x ∈ R) we have the positive recurrence property provided that the random coefficients of the volatilities satisfy certain conditions. The proofs of both propositions can be found in the Appendix.
Proposition 2.2. Suppose that g is a differentiable function, bounded from below by some c g > 0. Suppose also that for all x ∈ R and i ∈ N, for all possible values of C i . Then g has the positive recurrence property.
which is a deterministic vector in R 4 , the function h is bounded, and that g has the positive recurrence property, in which case we have σ 1,1 = E[h(σ 1,1,1, * )], σ 2,1 = E[h 2 (σ 1,1,1, * )], and σ = E[h(σ 1,2,1, * )h(σ 1,2,2, * )]. Consider now the one-dimensional large portfolio model where the distance to default of the i-th asset X i, * t evolves in time according to the system whereρ 1,i = ρ 1,iσ σ 2,1 . Then, we have the convergence Remark 2.5. Since all volatility processes have the same stationary distribution, a simple Cauchy-Schwartz inequality shows thatσ ≤ σ 2,1 , which implies thatρ 1,i ≤ ρ 1,i < 1 and 1 −ρ 2 1,i is well-defined for each i. The above theorem gives only weak convergence and only under the restrictive assumption of having the same coefficients in each volatility. For this reason, we will also study the asymptotic behaviour of our system from a different perspective. In particular, we will fix the volatility path σ 1,1 and the coefficient vectors C ′ i , and we will study the convergence of the solution u ǫ (t, x) to the SPDE (2.1) in the sped up setting, i.e which is used to compute the loss L ǫ t . We write now E σ,C to denote the expectation given the volatility path σ 1,1 and the C i s, which we have fixed, and L 2 σ,C to denote the corresponding L 2 norms. By 2. of Theorem 4.1 in [15], the solution u ǫ to the above SPDE satisfies the identity which shows that the L 2 (R + ) norms of u ǫ , and also its L 2 ([0, T ] × R + ) norms (for any T > 0), are all uniformly bounded by a random variable which has a finite L 2 σ,C (Ω) norm (the assumptions made in [15] are also needed for this). It follows that in a subsequence of any given sequence of values of ǫ tending to zero, we have weak convergence to some element u * (see [3]), and we can have this both in L 2 σ,C ([0, T ] × R + × Ω) and P-almost surely in L 2 ([0, T ] × R + ). The characterization of the weak limits u * is given in the following theorem. Theorem 2.6. Suppose that g has the positive recurrence property and that for some If h is bounded from below by a positive constant c > 0, the same weak convergence holds also in , and u * is then the unique solution to (2.4) in that space. In this case there is a unique subsequential weak limit, and thus we have weak convergence as ǫ → 0 + .
It is not hard to see that the limiting SPDE (2.4) obtained in Theorem 2.6 corresponds to a constant volatility large portfolio model like the one given in Theorem 2.4 under the assumption that (κ This indicates that the convergence of the loss L ǫ t can only be established in a weak sense, as in general we will haveσ > σ 1,1 and thus ρ 1,i > ρ ′ 1,i for all i. This is stated explicitly in the next Proposition and its Corollary.

The main results: small vol-of-vol setting
We proceed now to the small vol-of-vol setting, where now only the volatility drifts are scaled by ǫ, i.e k i = κ i /ǫ for all i. This leads to the model where the i-th asset's distance to default satisfies The main feature of the above model is that when the random coefficients and the function g satisfy certain conditions, the i-th volatility process σ i,ǫ converges in a strong sense to the C-measurable mean θ i as ǫ → 0 + for all i ∈ N, and we can also determine the rate of convergence. The required conditions are the following, and they will be assumed to hold throughout the rest of this section: 1. The i.i.d random variables σ i , ξ i , θ i , κ i take values in some compact subinterval of R, with each κ i being bounded from below by some deterministic constant c κ > 0.
2. g is a C 1 function with at most linear growth (i.e |g(x)| ≤ C 1,g + C 2,g |x| for some C 1,g , C 2,g > 0 and all x ∈ R).

Both the function h and its derivative have polynomial growth.
Under the above conditions, the convergence of each volatility process to its mean is given in the following proposition The reason for having only weak convergence of our system in the large vol-of-vol setting was the fact that the limiting quantities σ 1,1 , σ 2,1 andσ did not coincide. On the other hand, Proposition 3.1 implies that the corresponding limits in the small vol-of-vol setting are equal, allowing us to hope for our system to converge in a stronger sense.
Let u ǫ be the solution to the SPDE (2.1) in the small vol-of-vol setting, where we have fixed the volatility paths and the random coefficients. Working as in the case of (2.2) and the proof of Theorem 2.3, it is possible to establish similar asymptotic properties for the SPDE as ǫ → 0 + . However, it is more convenient to work with the antiderivative v 0,ǫ := +∞ · u ǫ (·, y)dy, which satisfies the same SPDE but with different initial and boundary conditions, as the loss L ǫ t = 1 − P[X 1,ǫ t > 0 | W 0 , B 0 , G] equals the average of its value at 0 over all possible volatility paths and coefficient values, while its convergence can be established in a much stronger sense and without the need to assume that W 0 and B 0 are uncorrelated. Our main result is stated below , which arises from the constant volatility model 2) corresponds to the model (3.3) in the sense that given the loss L t , the mass of non-defaulted assets 1 − L t equals In order to estimate the rate of convergence of probabilities of the form (1.3), we consider the approximation error 1], and determine its order of convergence.
Observe now that since the conditional probabilities take values in the compact interval [0, 1], it is equivalent to have (4.1) for all continuous G : [0, 1] → R, and by the Weierstrass approximation theorem and linearity, we actually need to have this only when G is a polynomial of the form G(x) = x m . We now write Y i,ǫ for the i-th asset's distance to default in the sped up volatility setting, when the stopping condition at zero is ignored, that is The m stochastic processes {X i,ǫ : 1 ≤ i ≤ m} are obviously pairwise i.i.d when the information contained in W 0 , B 0 and G is given. Therefore we can write: Next, for each i, we write Y i, * for the process X i, * when the stopping condition at zero is ignored, that is Again, it is easy to check that the processes Y i, * are pairwise i.i.d when the information contained in W 0 , B 0 and G is given. Thus, we can write Then, (4.2) and (4.3) show that the result we want to prove has been reduced to the convergence in distribution as ǫ → 0 + (since the probability that any of the m minimums equals zero is zero, as the minimum of any Gaussian process is always continuously distributed, while Y i,ǫ is obviously Gaussian for any given path of σ i,1 ). Let C([0, t]; R m ) be the classical Wiener space of continuous functions defined on [0, t] and taking values in R m (i.e the space of these functions equipped with the supremum norm and the Wiener probability measure), and observe that min 1≤i≤m p i (min 0≤s≤t ·(s)) defined on C([0, t]; R m ), where p i stands for the projection on the i-th axis, is a continuous functional. Indeed, for any two continuous functions for some s 1 , s 2 ∈ [0, t] and 1 ≤ i 1 , i 2 ≤ m, and without loss of generality we may assume that the difference inside the last absolute value is nonnegative. Moreover we have: and thus min 1≤i≤m Obviously, max 1≤i≤m p i (·(t)) defined on C([0, t]; R m ) is also continuous (as the maximum of finitely many evaluation functionals). Therefore, our problem is finally reduced to showing that (Y 1,ǫ , Y 2,ǫ , ..., Y m,ǫ ) converges in distribution to (Y 1, * , Y 2, * , ..., Y m, * ) in the space C([0, t]; R m ), as ǫ → 0 + .
In order to show the convergence in distribution we first establish that a limit in distribution exists as as ǫ → 0 + by using a tightness argument, and then we will characterize the limits of the finite dimensional distributions. To show tightness of the laws of (Y 1,ǫ , Y 2,ǫ , ..., Y m,ǫ ) for ǫ ∈ R + , which implies the desired convergence in distribution, we recall a special case of Theorem 3.7.2 in Ethier and Kurtz [9] for continuous processes, according to which it suffices to prove that for a given η > 0, there exist some δ > 0 and N > 0 such that: and P sup 0≤s 1 ,s 2 ≤t, |s 1 −s 2 |≤δ for all ǫ > 0. (4.4) can easily be achieved for some very large N > 0, since we have .., x m ), which is independent of ǫ and almost surely finite (the sum of the probabilities that the norm of this vector belongs to [n, n + 1] over n ∈ N is a convergent series and thus, by the Cauchy criteria, the same sum but for n ≥ N tends to zero as N tends to infinity). For (4.5), observe that | · | R m can be any of the standard equivalent L p norms of R m , and we choose it to be L ∞ . Then we have: and since the Ito integral The first of the last two probabilities is clearly zero for δ < η 2(r+M ) , while the second one can also be made arbitrarily small for small enough δ, since by a well known result about the modulus of continuity of a Brownian motion (see Levy [25]) the supremum within that probability converges almost surely (and thus also in probability) to 0 as fast as M 2δ ln 1 M 2 δ . Using these in (4.6) we deduce that (4.5) is also satisfied and we have the desired tightness result, which implies that (Y 1,ǫ · , ..., Y m,ǫ · ) converges in distribution to some limit (Y 1,0 · , ..., Y m,0 · ) (along some sequence).
Theorem 2.6. Let V be the set of W 0 · -adapted, square-integrable semimartingales on [0, T ]. Thus for any {V t : 0 ≤ t ≤ T } ∈ V, there exist two W 0 -adapted and squareintegrable processes {v 1,t : 0 ≤ t ≤ T } and {v 2,t : 0 ≤ t ≤ T }, such that for all t ≥ 0. The processes of the above form for which {v 1,t : 0 ≤ t ≤ T } and for all 0 ≤ t ≤ T and i ∈ {1, 2}, with each F i being F W 0 t 1 -measurable, span a linear subspaceṼ which is dense in V under the L 2 norm. By using the boundedness of h and then the estimate (2.3), for any p > 0 and any T > 0 we obtain It follows that given a sequence ǫ n → 0 + , there exists always a subsequence {ǫ kn : n ∈ N}, such that h p (σ 1,1 · ǫ )u ǫ (·, ·) converges weakly to some u p (·, ·) in the space L 2 σ,C ([0, T ] × R + × Ω) for p ∈ {1, 2}. Testing (2.2) against an arbitrary smooth and compactly supported function f of x ∈ R + , using Ito's formula for the product of R + u ǫ (·, x)f (x)dx with a process V · ∈Ṽ having the form (4.7) -(4.8), and finally taking expectations, we find that: for all t ≤ T . Thus, setting ǫ = ǫ kn and taking n → +∞, by the weak convergence results mentioned above we obtain for all 0 ≤ t ≤ T . The convergence of the terms in the RHS of (4.10) holds pointwise in t, while the one term in the LHS converges weakly. Since we can easily find uniform bounds for all the terms in (4.10) (by using (4.9)), the dominated convergence theorem implies that all the weak limits coincide with the corresponding pointwise limits, which gives (4.11) as a limit of (4.10) both weakly and pointwise in t. It is clear then that for both i = 1 and i = 2, both weakly and pointwise in t ∈ [0, T ], while the limits are also differentiable in t everywhere except the two jump points t 1 and t 2 . This follows because everything is zero outside [t 1 , t 2 ], while both v 1 and v 2 are constant in t and thus of the form (4.7) -(4.8) if we restrict to that interval. Subtracting from each term of (4.10) the same term but with u ǫ replaced u * and then adding it back, we can rewrite this identity as which tends to zero (when ǫ = ǫ kn and n → ∞) by the dominated convergence theorem, since the quantity inside the last integral converges pointwise to zero and it can be dominated by using (4.9). The same argument is used to show that the 4th and 6th terms in (4.12) tend also to zero along the same subsequence. Finally, for any term of the form for p, m ∈ {0, 1, 2}, we can recall the differentiability of the second factor inside the integral (which was mentioned earlier) and then use integration by parts to write it as: which converges, by the positive recurrence property, to the quantity Using integration by parts once more, this last expression is equal to This last convergence result holds also if we replace V by v 1 or v 2 , as we can show by following exactly the same steps in the subinterval [t 1 , t 2 ] (where v i is supported for i ∈ {1, 2} and where we have differentiability that allows integration by parts). If we set now ǫ = ǫ kn in (4.12), take n → +∞, and substitute all the above convergence results, we obtain SinceṼ is dense in V, for a fixed t ≤ T , we can have (4.13) for any square-integrable martingale {V s : 0 ≤ s ≤ t}, for which we have v 1,s = 0 for all 0 ≤ s ≤ t. Next, we denote by R u (t, x) the RHS of (2.4). Using then Ito's formula for the product of x)f (x)dx from both sides, taking expectations and finally substituting from (4.13), we find that for our fixed t ≤ T . Using the martingale representation theorem, V s can be taken equal to E σ,C I Es | σ {W 0 s ′ : s ′ ≤ s} for all s ≤ t, where we define and this implies V t = I Es allowing us to write for any 0 ≤ t ≤ T . If we integrate the above for t ∈ [0, T ] we obtain that where the quantity inside the expectation is always non-negative and becomes zero only and working in the same way with the indicator of the complement I E c t we can deduce the opposite inequality as well. Thus, we must have R + R u (t, x)f (x)dx = R + u * (t, x)f (x)dx almost everywhere, and since the function f is an arbitrary smooth function with compact support, we can deduce that R u coincides with u * almost everywhere, which gives (2.4).
If h is bounded from below, we can use (2.3) to obtain a uniform (independent of ǫ) bound for the H 1 0 (R + ) ⊗ L 2 σ,C (Ω × [0, T ]) norm of u ǫ kn , which implies that in a further subsequence, the weak convergence to u * holds also in that Sobolev space, in which (2.4) has a unique solution [5]. This implies convergence of u ǫ to the unique solution of (2. This calculation shows that this bound is only attainable when σ i,j,1, * = σ i,j,2, * for all i and j with i = j, and this happens only when all the assets share a common stochastic volatility (i.e ρ 2 = 1). For the lower bound, considering our volatility processes for i = 1 and i = 2 started from their 1-dimensional stationary distributions independently, we have for any t, ǫ ≥ 0 since σ 1,1 and σ 2,1 are identically distributed, and also independent when B 0 is given. Taking ǫ → 0 + on (4.14) and recalling the positive recurrence property, the definition ofσ, and the dominated convergence theorem on the LHS (since the quantity inside the expectation there is bounded by the square of an upper bound of h), we obtain the lower bound, i.eσ ≥ σ 1,1 , which can also be shown to be unattainable in general. Indeed, if we choose h such that its compositionh with the square function is strictly increasing and convex, and if g is chosen to be a square root function (thus we are in the CIR volatility case), for any α > 0 we have Let σ ρ t be the solution to the SDE Then σ ρ can be shown to be the square root of a CIR process having the same meanreversion and vol-of-vol as σ 1,1 and a different stationary mean, which satisfies the Feller condition for not hitting zero at a finite time. If for some t 1 > 0 we have σ ρ t 1 > σ B 0 t 1 , we consider t 0 = sup{s ≤ t 1 : σ ρ s = σ B 0 s } which is obviously non-negative. Then, since which is a contradiction. Thus σ ρ s ≤ σ B 0 s for all s ≥ 0, and in (4.15) this gives By the positive recurrence of σ ρ (which is the root of a CIR process, the ergodicity of which has been discussed in [11]), the RHS of the above converges to α 2 P(σ ρ, * ≥h −1 (α + σ 1,1 )) as ǫ → 0 + , where σ ρ, * has the stationary distribution of σ ρ . This expression can only be zero when σ ρ, * is a constant, and since the square of σ ρ satisfies Feller's boundary condition, this can only happen when ρ 2 = 0. In that case, we can easily check that σ 1,2,1, * and σ 1,2,1, * are independent, which implies thatσ = σ 1,1 . This completes the proof.
Corollary 2.8. Suppose that P(X 1,ǫ t ∈ I | W 0 · , B 0 · , G) converges to P(X 1, * t ∈ I | W 0 · , G) in probability, under the assumptions of both Theorem 2.4 and Theorem 2.6. The same convergence has to hold in a strong L 2 sense for some sequence ǫ n ↓ 0, since it will hold P -almost surely for some sequence, and then we can apply the dominated convergence theorem. Therefore, the same convergence must hold weakly in L 2 as well. However, assuming for simplicity that (r i , ρ 1,i ) is also a constant vector (r, ρ 1 ) for all i and fixing a sufficiently integrable and σ(W 0 · , B 0 · ) ∩ G-measurable random variable Ξ, by Theorem 2.6 we have where for each i we define given W 0 and G is the unique solution u * to (2.4) [18]. Therefore, by the uniqueness of a weak limit we must have P[X 1, * t ∈ I | W 0 , G] = P[X 1,w t ∈ I | W 0 , G] P-almost surely, which cannot be true for any interval I, as otherwise the processes X 1,w · and X 1, * · would coincide, which is clearly not the case here. Indeed, this can only be true whenρ 1,1 = ρ ′ 1 ⇔σ = σ 1,1 , and by Proposition 2.7 this is generally not the case unless ρ 2 = 0.

Proofs: small vol-of-vol setting
We proceed now to the proofs of Proposition 3.1, Theorem 3.2 and Corollary 3.3, the main results of Section 3.
Proposition 3.1. First, we will show that each volatility process has a finite 2p-moment for any p ∈ N. Indeed, we fix a p ∈ N and we consider the sequence of stopping times {τ n,ǫ : n ∈ N}, where τ n,ǫ = inf{t ≥ 0 : σ i,ǫ t > n}. Setting σ i,n,ǫ t = σ i,ǫ t∧τn,ǫ , by Ito's formula we have where the stochastic integral is a martingale. Taking expectations, setting f (t, n, p, ǫ) = E[(σ i,n,ǫ t − θ i ) 2p ] and using the growth condition of g (|g(x)| ≤ C 1,g + C 2,g |x| for all x ∈ R) and simple inequalities, we can easily obtain f (t, n, p, ǫ) ≤ M + M ′ t 0 f (s, n, p, ǫ)ds with M, M ′ depending only on p, c g and the bounds of σ i , ξ i , θ i . Thus, using Gronwall's inequality we get a uniform (in n) estimate for f (t, n, p, ǫ), and then by Fatou's lemma we obtain the desired finiteness of This implies the almost sure finiteness of the conditional expectation Taking expectations given C, letting n → +∞ on (5.1), using the monotone convergence theorem (all quantities are monotone for large enough n) and the growth condition on g, we find that where again, M, M ′ depend only on p, c g and the bounds of σ i , ξ i , θ i . Using Grownwall's inequality again on the above, we obtain the estimate Then, we have that Now using Ito's formula for the L 2 norm (see Krylov and Rozovskii [22]), given the volatility path and C, we obtain where N (t, ǫ) is some noise due to the correlation between B 0 and W 0 , with E[N (t, ǫ)] = 0. In particular, since for some Brownian motion V 0 independent from B 0 we could have written W 0 = 1 − ρ 2 3 V 0 + ρ 3 B 0 , we will have Next, we can apply 2. of Theorem 4.1 in [15] to the SPDE (3.1) to find v 0,ǫ x (s, ·) L 2 σ,C (Ω×R + ) = u ǫ (s, ·) L 2 σ,C (Ω×R + ) ≤ u 0 (·) L 2 (Ω×R + ) for all s ≥ 0. Using this expression, we can obtain the following estimate ds, (5.3) and in the same way for some η > 0. Moreover, we have the estimate and by using v 0,ǫ x (s, ·) L 2 σ,C (Ω×R + ) ≤ u 0 (·) L 2 (Ω×R + ) again, we also obtain Using (5.3), (5.4), (5.5), (5.6) and (5.7) in (5.2), and then taking η to be sufficiently small, we get the estimate and where M, m > 0 are constants independent of the fixed volatility path. Taking expectations in (5.8) to average over all volatility paths, we find that and using Gronwall's inequality on the above we finally obtain for some σ 1,ǫ s, * lying between θ 1 and σ 1,ǫ s , with for some λ 1 , λ 2 > 0 and some m ∈ N, which allows us to bound the RHS of (5.9) by a linear combination of terms of the form σ 1,ǫ · −θ 1 p L p (Ω×[0, T ]) which are all O(ǫ) as ǫ → 0 + by Proposition 3.1. The proof of the theorem is now complete.
Next, for any η > 0 we have and if we denote by S the σ-algebra generated by the volatility paths, since X 1, * t is independent of S and the path of B 0 , by using the Cauchy-Schwarz inequality we find that where the last follows by using Morrey's inequality in dimension 1 (see e.g. Evans [10]) and Theorem 3.2. On the other hand, since P[X 1, * t > 0 | W 0 , G] has a bounded density near x, uniformly in t ∈ [0, T ], we have Therefore, (5.11) gives for any η > 0, and in a similar way we can obtain . Using these two expressions in (5.10) and taking η = ǫ p for some p > 0, we finally obtain which becomes optimal as ǫ → 0 + when 1 − 2p = p ⇔ p = 1 3 . This gives E(x, T ) = O(ǫ 1 3 ) as ǫ → 0 + .

A APPENDIX: Proofs of positive reccurence results
In this Appendix we prove Proposition 2.2 and Proposition 2.3. Both proofs are based on Theorem 2.5 from Bhattacharya and Ramasubramanian [2], which gives sufficient conditions for an n-dimensional Markov process X with infinitesimal generator to be positive reccurent, i.e possess an invariant probability distribution v on R n such that for any v-integrable function f . That theorem involves the functions 1. a i,j (·, ·) and b i (·, ·) are Borel measurable on [0, T ] × R n and bounded on compacts.
2. For each N > 0, there exists a δ N (r) ↓ 0 as r ↓ 0 such that for all t ≥ 0 and x, y ∈ R n with t, |x| 2 , |y| 2 ≤ N we have where · 2 stands for the matrix 2-norm.
3. For any compact K ⊂ R n and every z ′ ∈ R k , the function is bounded away from +∞ on [0, +∞] × K.
4. There exist z ∈ R n and r 0 > 0 such that: +∞ r 0 e −Iz,r 0 (r) dr = +∞ and +∞ r 0 e Iz,r 0 (r) dr < +∞ We proceed now to our proofs, where we will establish positive reccurence results by showing that the above conditions are satisfied.
Proposition 2.2. It suffices to show that the two-dimensional continuous Markov process (σ 1,1 , σ 2,1 ) is positive recurrent. To do this, we set H i (x) = x 0 1 v i g(y) dy which is a strictly increasing bijection from R to itself, and then Z i = H i (σ i,1 ), for i ∈ {1, 2}. It suffices to show that the two-dimensional process Z = (Z 1 , Z 2 ) is positive recurrent. The infinitesimal generator L Z of Z maps any smooth function F : , with V i being a continuous and strictly decreasing bijection from R to itself for i ∈ {1, 2}.
We can compute and also B(s, (x, y)) = 1 and for all (x, y), (z, w) ∈ R 2 . Since the coefficients of L Z are continuous, with the higher order ones being constant, we can easily verify conditions 1. and 2. Moreover, since B and C (z, w) are constant in t and continuous in (x, y) while A (z, w) is lower-bounded by 1 2 (1 − λ) > 0, it follows that we have 3. as well.
Therefore, we have that all the required conditions are satisfied for the process Z = (Z 1 , Z 2 ), which means that (Z 1 , Z 2 ) is a positive recurrent diffusion, and thus (σ 1,1 , σ 2,1 ) is positive recurrent as well.
We will show now that the last term in the RHS of (A.8) above is negative for r ≥ r 0 with r 0 large enough (depending on p). Indeed, by using (A.4), the definition of B(s, (x, y)), and the fact thatg is upper-bounded, we can obtain the estimate sup (x−z) 2 +(y−w) 2 =r 2 B s, (x, y) + (1 − p)C (z, w) s, (x, y) A (z, w) s, (x, y) where as before, we have ξ = max{v 1 , v 2 } 2 2 sup x∈Rg (x), and κ * = (1 − p)κ. The numerator in the last supremum can easily be shown to tend to −∞ when x or y tends to ±∞, which happens when r → +∞. Thus, for r ≥ r 0 with r 0 large enough, the RHS of (A.9) is negative.