Asymptotic optimality of a first-order approximate strategy for an exponential utility maximization problem with a small coefficient of wealth-dependent risk aversion

In Delong (2019) we investigate an exponential utility maximization problem for an insurer who faces a stream of non-hedgeable claims. We assume that the insurer’s risk aversion coefficient consists of a constant risk aversion and a small amount of wealth-dependent risk aversion. We apply perturbation theory and expand the equilibrium value function of the optimization problem on the parameter controlling the degree of the insurer’s risk aversion depending on wealth. We derive a candidate for the first-order approximation to the equilibrium investment strategy. In this paper we formally show that the zeroth-order investment strategy π∗ 0 postulated by Delong (2019) performs better than any strategy π0 when we compare the asymptotic expansions of the objective functions up to order O(1) as → 0, and the first-order investment strategy π∗ 0 + π ∗ 1 postulated by Delong (2019) is the equilibrium strategy in the class of strategies π∗ 0 + π1 when we compare the asymptotic expansions of the objective functions up to order O( ) as → 0, where denotes the parameter controlling the degree of the insurer’s risk aversion depending on wealth.

Abstract: In Delong (2019) we investigate an exponential utility maximization problem for an insurer who faces a stream of non-hedgeable claims. We assume that the insurer's risk aversion coefficient consists of a constant risk aversion and a small amount of wealth-dependent risk aversion. We apply perturbation theory and expand the equilibrium value function of the optimization problem on the parameter controlling the degree of the insurer's risk aversion depending on wealth. We derive a candidate for the first-order approximation to the equilibrium investment strategy. In this paper we formally show that the zeroth-order investment strategy π * 0 postulated by Delong (2019) performs better than any strategy π 0 when we compare the asymptotic expansions of the objective functions up to order O(1) as → 0, and the first-order investment strategy π * 0 + π * 1 postulated by Delong (2019) is the equilibrium strategy in the class of strategies π * 0 + π 1 when we compare the asymptotic expansions of the objective functions up to order O( 2 ) as → 0, where denotes the parameter controlling the degree of the insurer's risk aversion depending on wealth. Keywords: Wealth-dependent risk aversion, PDEs, perturbation theory, asymptotic optimality.

Introduction
In Delong (2019) we investigate an exponential utility maximization problem for an insurer who faces a stream of non-hedgeable claims. We assume that the insurer's risk aversion coefficient changes in time and depends on the current insurer's net asset value (the excess of assets over liabilities). Since the optimization problem is timeinconsistent, we follow the game-theoretic approach developed by Ekeland and Lazrak (2006), Ekeland and Pirvu (2008), Björk and Murgoci (2014) and Björk et al. (2017). We use the notion of an equilibrium strategy and derive the HJB equation for the equilibrium value function. In order to solve the HJB equation, we use perturbation theory. We assume that the insurer's risk aversion coefficient consists of a constant risk aversion and a small amount of wealth-dependent risk aversion. The equilibrium value function is expanded on the parameter controlling the degree of the insurer's risk aversion depending on wealth. We derive candidates for the first-order approximations to the equilibrium value function and the equilibrium investment strategy. Delong (2019) proves a lot of results which are essential to characterize the firstorder approximation to the equilibrium investment strategy and justify the choice of his investment strategy as the first-order approximation. However, the order of the error of approximating the true equilibrium investment strategy with the candidate first-order approximate solution has not been proved. In this paper we formally study an asymptotic optimality of the investment strategy postulated by Delong (2019). More precisely, we show that the zeroth-order investment strategy π * 0 postulated by Delong (2019) performs better than any strategy π 0 when we compare the asymptotic expansions of the objective functions up to order O(1) as → 0, and the first-order investment strategy π * 0 + π * 1 postulated by Delong (2019) is the equilibrium strategy in the class of strategies π * 0 + π 1 when we compare the asymptotic expansions of the objective functions up to order O( 2 ) as → 0, where denotes the parameter controlling the degree of the insurer's risk aversion depending on wealth. From mathematical point of view, the results complete the results from Delong (2019) and give a more rigorous justification for the strategy derived in Delong (2019). From economic point of view, the proof that the candidate strategy from Delong (2019) is optimal (in some sense) is crucial for applications and conclusions derived from the model.
The assumption of constant risk aversion is best known in economics, finance and insurance. However, many empirical studies suggest that agents' risk attitudes are correlated with their wealth, see e.g. Shaw (1996), Wik et al. (2004), Anderson and Galinsky (2011), Bucciol and Miniaci (2011), Courbage et al. (2018). Consequently, we should use wealth-dependent risk aversion coefficients to model, and understand, economic decisions of investors and insurance companies in risky environment. In practice, insurance companies implement investment strategies for asset and liability management in order to pay random claims and earn a profit. In this paper we study an optimality of an investment strategy for a risk-averse insurer with time-varying risk aversion depending on the available wealth. The framework with stochastic risk preferences should better reflect the risk attitude of an insurer trying to make optimal decisions in financial markets. The investment strategy, which we derive in our theoretical model, can be used as a reference point for developing investment strategies for asset and liability management in real life.
To the best of our knowledge, there are only two papers by Dong and Sircar (2014) and Delong (2019) which study exponential utility maximization problems for investors with wealth-dependent risk aversion coefficients. Moreover, the first-order approximation to the equilibrium investment strategy postulated by Delong (2019) is a new investment strategy and its properties are worth investigating.
Perturbation techniques have been popularized in financial mathematics by Fouque et al. (2011), Fouque et al. (2014, Fouque and Hu (2017), . In particular, an asymptotic optimality of a candidate strategy in the class of strategies given by π 0 + π 1 is investigated by Fouque and Hu (2017) in a model where an investor maximizes an expected utility in a market with stochastic volatility. The idea to study the asymptotic expansions of the objective function up to orders O(1), O( ), O( 2 ) as → 0 and an asymptotic optimality of the candidate strategy in the class of strategies given by π 0 + π 1 is taken from Fouque and Hu (2017). However, the techniques which we use in this paper are different from the techniques used by Fouque and Hu (2017), since the models are different. Moreover, we deal with an equilibrium strategy, which is not the optimal strategy in the Bellman's sense, and we introduce a new asymptotic criterion for the equilibrium in order to formalize our asymptotic results.
In Sections 2-3 we briefly recall the model and the main results from Delong (2019) for reader's convenience. The results from Delong (2019) are used in the proofs in this paper. In Section 4 we present the main result of this paper and we study the asymptotic optimality, in an appropriate sense, of the investment strategy from Delong (2019). The proofs can be found in Section 5.
In the sequel, the conditional expected value will be denoted by E y [·] = E[·|Y (t) = y] where Y denotes the stochastic process which is used in the conditional expected value.
We will use functions of order O( θ ). Let us recall that for some 0 > 0, where K is independent of but may depend on (x, 0 ).

The financial and insurance model
We deal with a probability space (Ω, F, P) with a filtration F = (F t ) 0≤t≤T and a finite time horizon T < ∞. On the probability space (Ω, F, P) we define a standard twodimensional Brownian motion (W, B) = (W (t), B(t), 0 ≤ t ≤ T ) and a càdlàg (rightcontinuous with left limits) counting process N = (N (t), 0 ≤ t ≤ T ). We assume that The filtration F is right-continuous and completed with sets of measure zero. The financial market consists of a risk-free deposit D = (D(t), 0 ≤ t ≤ T ) and two risky indices: S = (S(t), 0 ≤ t ≤ T ), P = (P (t), 0 ≤ t ≤ T ). The value of the risk-free deposit is constant, i.e.: (2.1) The prices of the risky indices are modelled with correlated geometric Brownian motions: where µ, a, σ, b are positive constants which denote drifts and volatilities, and ρ ∈ [−1, 1] denotes the correlation coefficient between the log-returns of S and P . The insurance company can invest in the deposit D and in the index S. The index P is not available for trading. The index P is the underlying investment fund for the insurance contracts sold by the insurance company, see below for a detailed description.
The insurance company keeps a homogeneous portfolio consisting of n unit-linked policies. The counting process N is used to count the number of deaths in the insurance portfolio. We assume that the lifetimes of the policyholders are independent and exponentially distributed, i.e. we assume that Parameter λ denotes the mortality intensity in the population of the policyholders. We will use the process which counts the number of policies in force in the insurance portfolio. We remark that (A1) means that we assume that the insurance risk is independent of the financial risk under the real-world measure P.
The insurer faces a stream of non-hedgeable claims which is modelled with the process C = (C(t), 0 ≤ t ≤ T ) given by the equation (2.4) Each policyholder in the insurance portfolio is entitled to three types of benefits: annuity α paid as long as the policyholder lives, life insurance benefit β paid if the policyholder dies and endowment benefit η paid if the policyholder survives till the terminal time T . The benefits α, β, γ are contingent on the non-tradeable index P . We assume that (A3) the functions α, β, η : (0, ∞) → [0, ∞) are bounded and Lipschitz continuous.
In order to fulfill the future liabilities, the insurer must hold a reserve. The reserve is set for the policies in force. The reserve is defined by whereQ denotes a pricing measure for C. Here, by reserve we mean an amount of money which the insurer sets aside to cover the future claims. In practice, the insurer can use best estimate, market-consistent or first-order assumptions to calculate the reserve, see e.g Chapter 2 in Møller and Steffensen (2009). The pricing and reserving assumptions are reflected in the measureQ, under which the real-world dynamics of the risk factors are modified in accordance with the assumptions. We don't make any assumptions on the pricing measureQ in (2.5). However, we assume that .., n}, and the function F 1 : In most cases, the insurance risk would be assumed to be independent of the financial risk under the pricing measureQ. If (A1) also holds underQ, then (A4) is satisfied. In the sequel, the reserve for one policy in force F 1 is simply denoted by F . For a detailed description of the financial and insurance model and a motivation for the optimization problem we refer to Delong (2019).
3 The optimization problem and the candidate firstorder approximate strategy Let π := (π(t), 0 ≤ t ≤ T ) denote an investment strategy which specifies the amount of money that the insurer invests in the index S. The wealth process of the insurer, denoted by X π = (X π (t), 0 ≤ t ≤ T ), satisfies the SDE: dX π (t) = π(t) µdt + σdW (t) −J(s−)α(P (s))ds + β(P (s))dJ(s), 0 ≤ t ≤ T, (3.1) where x > 0 denotes the initial wealth. The survival benefits η are subtracted from X π (T ) at the terminal time T . We study the time-inconsistent optimization problem: where Γ denotes a time-varying risk aversion coefficient which value at time t depends on the process The process R is interpreted as the insurer's net asset value -the excess of the insurer's assets over his liabilities. By the liability we mean the value of the reserve (2.5). The optimization problem (3.2) is called an exponential utility maximization problem with wealth-dependent risk aversion. We assume that the risk aversion coefficient in (3.2) satisfies the condition: (A5) Γ : R → (0, ∞) is bounded, decreasing, Lipschitz continuous and C 2 (R).
Let us introduce the set of admissible investment strategies for our optimization problem (3.2).
We can now define the objective function for (3.2): v k,π (t, x, p) We will also need the auxiliary objective function: w k,π (t, x, p, r) Due to time-inconsistency caused by the wealth-dependent risk aversion, we cannot find a strategy π which maximizes the objective function (3.3) and is optimal in the Bellman's sense. We look for the sub-game perfect Nash equilibrium in the game with the reward given by (3.3), see e.g. Björk et al. (2017).
.., n} and all π ∈ A, then π * is called the equilibrium strategy and v k,π * is called the equilibrium value function corresponding to the equilibrium strategy π * .
We consider a special structure of the wealth-dependent risk aversion coefficient Γ. We choose In this paper we assume that the insurer's risk aversion coefficient Γ consists of a constant risk aversion γ 0 > 0 and a small amount > 0 of wealth-dependent risk aversion γ 1 . Similar to (A5), we impose the condition: (A6) The function γ 1 : R → R is bounded, decreasing, Lipschitz continuous and C 2 (R). Moreover, γ 1 (0) = 0.
The assumption (3.6) allows us to apply perturbation theory and find the first-order approximation to the true solution to the optimization problem (3.2) for small > 0. We can apply perturbation theory since our problem (3.2) can be formulated by adding a small term to parameter of a related and exactly solvable problem. In our case, the exactly solvable problem is (3.2) with Γ(r) = γ 0 . We expect that the solution to the time-inconsistent exponential utility maximization problem (3.2) with the wealthdependent risk aversion coefficient Γ(r) = γ 0 + γ 1 (r) should be expanded in powers of around the solution to the time-consistent exponential utility maximization problem with the constant risk aversion γ 0 .
2. The investment strategy (3.13) and the function (3.14) are candidate asymptotic first-order approximations, respectively, to the equilibrium investment strategy and the equilibrium value function for the optimization problem (3.2) with the wealthdependent risk aversion coefficient Γ(r) = γ 0 + γ 1 (r) as → 0.
4 The main result: asymptotic optimality of the candidate first-order approximate strategy First, we specify the class of investment strategies in which we show that the investment strategy (3.13) is asymptotically optimal for our optimization problem (3.2) with the wealth-dependent risk aversion coefficient Γ(r) = γ 0 + γ 1 (r) as → 0. Next, we formalize and explain what we mean by the asymptotic optimality of (3.13) in our optimization problem. Finally, we present the main result of this paper.
Definition 4.1. Let us consider the utility maximization problem (3.2) with the wealthdependent risk aversion coefficient and π has the representation: The mappings x → π k i t, x, P (t, ω) satisfy the Lipschitz conditions: 3. The mappings x → π k i t, x, P (t, ω) satisfy the growth conditions: We remark that the amount of π 1 added to π 0 , in order to define the admissible strategy (4.1), is controlled with the parameter which represents the degree of the insurer's risk aversion depending on wealth. If we choose π 1 = 0, then we can consider strategies independent of the parameter within the class B. Finally, the processes H andH, which appear in the Lipschitz and growth conditions, may depend on the strategies π 0 , π 1 .
Since we use perturbation techniques, the idea of which is to expand the true solution in powers of the small parameter , it is natural to consider the investment strategies of the form (4.1) in point 1 of Definition 4.1, see also Fouque and Hu (2017). Points 2-4 from Definition 4.1 are closely related to points 2-4 from Definition 3.1. Points 2-3 from Definition 4.1 describe in more details the measurable mapping (t, x, p, k) → π k (t, x, p) which characterizes the investment strategy. In particular, points 2-3 from Definition 4.1 imply that points 2-3 from Definition 3.1 are satisfied. They are rather standard in the theory of stochastic differential equations and backward stochastic differential equations with BM O-martingales, see Chapter V.3 in Protter (2005) and Ankirchner et al. (2007). Finally, since we add a small amount of π 1 to π 0 in order to define the strategy π ∈ B in (4.1), we expect that point 4 from Definition 3.1 should only be needed for π 0 (which is point 4 from Definition 4.1). In Proposition 5.2 below, we show that B ⊂ A and the candidate first-order approximation to the equilibrium strategŷ π * ∈ B for a sufficiently small > 0. Although Definition 4.1 may look technical, we believe that it describes a very reasonable class of investment strategies which are important for our exponential utility maximization problem (3.2) with a small amount of wealth-dependent risk aversion and does not exclude any relevant strategies. We now present the main theorem of this paper.
(ii) The strategy π * 0 performs better than any strategy π 0 when we compare the asymptotic approximations to the objective functions up to order O(1), i.e.
The equality in (4.3) holds only for π 0 = π * 0 . (iii) For any strategy π * 0 + π 1 , we have the asymptotic first-order approximation to the objective function: (iv) The strategy π * 0 + π * 1 is the equilibrium strategy in the class of strategies π * 0 + π 1 when we compare the asymptotic approximations to the objective functions up to order (4.6) The equality in (4.5) holds only for π δ 1 = π * 1 .
Remark 4.1: a) The function v k,π 0 depends on since we use Γ(r) = γ 0 + γ 1 (r) . The function v k,π 0 +π 1 depends on since we use π = π 0 + π 1 and Γ(r) = γ 0 + γ 1 (r) . The subscript in (v k ) n k=0 will be omitted in the sequel. b) If we use Γ(r) = γ 0 , then π * 0 is the optimal investment strategy for the time-consistent exponential utility maximization problem (3.2) with the constant risk aversion coefficient γ 0 , and the functions (v k 0 ) n k=0 define the corresponding optimal value function, see Proposition 5.1 in Delong (2019). We note that We consider a class of strategies which is potentially smaller than the class B since we require that the objective functions (3.3)-(3.4) are smooth for the strategies considered in Theorem 4.1. This assumption is reasonable since in this paper we work with smooth (classical) solutions to HJB equations and PDEs. In Theorem 3.1 we assume that the equilibrium value function (i.e. the objective function for our optimization problem for the equilibrium strategy) is a smooth solution to HJB equations. In Proposition 5.1 below we prove that the candidate first-order approximation to the equilibrium value function is a smooth solution to PDEs. Finally, Remark b shows that the optimal value function for the time-consistent optimization problem with constant risk aversion (i.e. the objective function for our optimization problem with = 0) is also smooth.
Theorem 4.1 gives a more rigorous justification for the investment strategy derived in Delong (2019). The assertions (i)-(ii) from Theorem 4.1 are intuitively clear in the view of Remark 4.1.b. The zeroth-order investment strategy π * 0 postulated in Theorem 3.1 and by Delong (2019) performs better than any strategy π 0 when we compare the asymptotic expansions of the objective functions up to order O(1) as → 0. If we want to study investment strategies which are series expansions in powers of , then, by perturbation theory and Remark 4.1.b., it is natural to consider expansions around the strategy π * 0 . The most interesting are the assertions (iii)-(iv) from Theorem 4.1 where we show that the first-order investment strategy π * 0 + π * 1 postulated in Theorem 3.1 and by Delong (2019) is the equilibrium strategy in a reasonable class of strategies π * 0 + π 1 when we compare the asymptotic approximations to the objective functions up to order O( 2 ) as → 0. The criterion (4.5) is a modification of the well-established criterion (3.5) for the equilibrium in continuous-time models. In (3.5) we compare the objective functions for the exponential utility maximization problem with the risk aversion coefficient Γ(r) = γ 0 + γ 1 (r) for the strategies π * and π δ . In (4.5) we use the asymptotic expansions (4.4) of the objective functions for the exponential utility maximization problem with the risk aversion coefficient Γ(r) = γ 0 + γ 1 (r) for the strategies π * 0 + π * 1 and π * 0 + π δ 1 and compare the terms in these expansions up to order O( 2 ). To the best of our knowledge the criterion (4.5) is new and has not been investigated in the literature. We point out that (4.5) is not related to -equilibrium.

The proof of the main result
First, we introduce operators associated with the continuous parts of the processes (X π , P, R).
Next, we briefly recall some results from Delong (2019) which we will use in the sequel.
Proof of Theorem 3.1: By Theorem 3.1 from Delong (2019)], the equilibrium strategy and the equilibrium value function for (3.2) are characterized with the system of HJB for k ∈ {0, 1, ..., n}. If we assume the risk aversion coefficient Γ(r) = γ 0 + γ 1 (r) with small > 0, then we can postulate the following first-order expansions for the solutions to the HJB equations (5.1)-(5.2): We also assume that derivatives of (v k ) n k=0 , (w k ) n k=0 satisfy the first-order expansions of the same form (5.3)-(5.4). From equation (5.1), we can now deduce the first-order expansion for the equilibrium strategy: We substitute the expansions for (v k ) n k=0 , (w k ) n k=0 and (π k, * ) n k=0 into the system of HJB equations (5.1)-(5.2). We collect the terms of order O(1), O( ), O( 2 ) and set them to zero. We can derive the system of PDEs: for k = 0, 1, ..., n. We can find the solutions to the PDEs (5.8)-(5.11). These solutions are given by where the functions (h k ) n k=0 and (g k ) n k=0 solve the PDEs (3.7) and (3.8). The first-order approximation to the equilibrium strategy (5.5) is determined with (3.9)-(3.10). 2. In addition, assume that (A7) There exist mixed derivatives (h k tp ) n k=0 ∈ C([0, T ) × (0, ∞)).

Let us define
The processes (
We are now heading towards the proof of our main result. We prove Theorem 4.1 by using series of lemmas and propositions.
Proof: Assertion (i): We choose π = π 0 + π 1 ∈ B from Definition 4.1. We will show that all points from Definition 3.1 are satisfied. Point 1 is obvious. Point 2 follows from the growth conditions for π 0 and π 1 . Point 3 can be deduced from Theorem V.7 in Protter (2005) since π 0 and π 1 are process Lipschitz. We are left with point 4. Let us introduce the process The process Y is used to define the solution to the exponential utility maximization problem (3.2) with the constant risk aversion coefficient Γ(r) = γ 0 = γ, see Theorem 5.1 in Delong (2019). We can show that where v k 0 (t, x, p) is the optimal value function for the time-consistent exponential utility maximization problem for the initial point (t, x, p, k). We choose r ∈ R and set γ 1 := γ 1 (r). We choose t ∈ [0, T ]. We have the following decomposition: where we introduce the strategỹ π(s) = γ 1 π 0 (s) + (γ 0 + γ 1 )π 1 (s), 0 ≤ s ≤ T.

From point 3 from Definition 4.1 and (A6), we deduce that the process
We now study the expected value: By Hölder's inequality and boundedness of α, β, η, we can derive for a sufficiently small q 1 > 1 and its conjugate q * 1 > 1. We can choose a sufficiently small q 1 > 1 such that γ 0 q 1 = γ 0 + γ 1 (r) = Γ(r), if a sufficiently small > 0 is used. Consequently, by point 4 from Definition 4.1, the first expected value in (5.21) is finite. As far as the second expected value is concerned, we introduce the process The process M is an exponential martingale generated by a BM O-martingale since (5.19) holds. Consequently, applying Hölder inequality and reverse Hölder inequality to the exponential martingale, see Theorem 3.1 in Kazamaki (1997), we get for a sufficiently small q 2 > 1 and its conjugate q * 2 > 1. We remark that the constant q 2 depends on T 0 q * 1 π(s)σdW (s)

BM O
. Finally, for a sufficiently small , we have the inequality  Kazamaki (1997). Collecting (5.21) and (5.23), we can conclude that the expected value (5.20) is a.s. finite and our strategy π satisfies point 4 from Definition 3.1. Hence, π ∈ B implies that π ∈ A. Assertion (ii): Point 1 from Definition 4.1 is obvious. Points 2-3 can be deduced from (A6) and the properties specified in Proposition 5.1. In particular, the properties that the mapping p → h k (t, p) is Lipschitz continuous on (0, ∞) uniformly in t ∈ [0, T ] and h k ∈ C([0, T ] × (0, ∞)) ∩ C 1,2 ([0, T ) × (0, ∞)) imply that the derivative (t, p) → h k p (t, p) is uniformly bounded and jointly continuous on [0, T ) × (0, ∞). In the definition of the investment strategy (3.13) we choose the left limit lim t →T − h k p (t, P (t, ω)) and we have a continuous, finite mapping t → h k p (t, P (t, ω))P (t, ω) on [0, T ] for a.a ω. The same arguments hold for g k p (t, p). We have to prove point 4. In fact, we only have to prove that the first expected value in (5.21) is finite if π * 0 is used. By Remark 4.1.b. the strategy π * 0 is the optimal investment strategy for the optimization problem (3.2) with constant risk aversion, see also Theorems 5.1, 6.1 in Delong (2019). From properties of the optimal value function (5.17) for the time-consistent exponential utility maximization problem (3.2) with the constant risk aversion Γ(r) = γ 0 , we can deduce that is an exponential martingale generated by a BM O-martingale, see eq. (8.12) in Delong (2019) or a general theory in Hu et al. (2005). Hence, by reverse Hölder inequality, we can choose a sufficiently small q 1 > 1 such that We can now use the same arguments as in the first part of the proof.
Lemma 5.1. Let π ∈ A denote an admissible strategy for the utility maximization problem (3.2) with the wealth-dependent risk aversion coefficient Γ(r) = γ 0 + γ 1 (r) with > 0, and let (v k 0 , v k 1 , w k 0 , w k 1 ) n k=0 denote the solutions to the PDEs (5.8)-(5.11). Proof: The solutions to (5.8)-(5.11) are given by (5.12)-(5.15). By Proposition 5.1, the functions (h k ) n k=0 , (g k ) n k=0 are bounded in (t, p, k). Since γ 1 is bounded by (A6), it is sufficient to prove that {e −γ 0 X π (T ) , T is an F−stopping time} and {e −γ 0 X π (T ) X π (T ), T is an F− stopping time} are uniformly integrable for any π ∈ A. We choose π ∈ A. Points 2 and 4 from Definition 3.1 and the assumption (A6) that γ 1 (0) = 0 imply that the family e −γ 0 X π (T ) , T is an F − stopping time is uniformly integrable, see Remark 8 in Hu et al. (2005). We now consider the second family. We choose a sufficiently small q > 1. We have the inequality where we choose a sufficiently small κ > 1, and κ * denotes its conjugate. Since we can set γ 0 κq = γ 0 + γ 1 (r) = Γ(r) for some r ∈ R and sufficiently small > 0, κ > 1, q > 1, the first term in (5.25) is finite by uniform integrability of {e −γ 0 κqX π (T ) , T is an F − stopping time} (by points 2 and 4 from Definition 3.1 and the arguments from above). As far as the second term in (5.25) is concerned, let us recall the dynamics (3.1) for the process X π . For any κ * > 1 and q > 1, we have the inequalities where we use the Burholder-Davis-Gundy inequality and the energy inequality (see e.g. page 29 in Kazamaki (1997).
Proof: This is a modification of a well-known result which concerns uniform integrability of conditional expectations. We choose π ∈ A.

Consequently, by
Step 3, for any δ 0 > 0, we can choose δ and C such that We conclude that the family w J(T ),π (T , X π (T ), P (T ), R(T )) indexed with stopping times T is uniformly integrable. The remaining families of random variables can be studied in the exactly the same way.
Step 1d: We improve the estimates (5.20)-(5.23). Let us choose sufficiently small q > 1, κ > 1, ι > 1, and let q * , κ * , ι * denote their conjugates. We introduce the martingales: We note that for all ∈ [0, 0 ], and the constant K is independent of . Consequently, by Theorem 3.1 from Kazamaki (1997), for all ∈ [0, 0 ] we can find a universal, sufficiently small ι > 1 such that all stochastic exponentials of t 0 qκ * γ 0 π 1 (s)σdW (s) satisfy the reverse Hölder inequality with the common power ι. The reverse Hölder inequality gives us the estimate all ∈ [0, 0 ], and the constant K depends on ι but is independent of . Using the arguments from the proof of Proposition 5.2 together with Doob's inequality, we can now conclude that with some r > 1, for all ∈ [0, 0 ]. The constants K 1 , K 2 , r in (5.34) are independent of . We show, in this step and in the sequel, that our constants are independent of since in Steps 3-4 we prove that the approximation error is of order O( 2 ), where O( 2 ) is defined by (1.1) and the constant K for the approximation error in (1.1) must be independent of .
We also improve the estimate (5.26). Let us choose any q > 1. Applying Burkholder-Davis-Gundy inequality as in the proof of Lemma 5.1, we can show that . However, the dependence of constants on the applied strategies will not be pointed out if this dependence is not needed for the proof.