Optimal investment for insurance company with exponential utility and wealth-dependent risk aversion coefficient

We investigate an exponential utility maximization problem for an insurer who faces a stream of non-hedgeable claims. The insurer’s risk aversion coefficient changes in time and depends on the current insurer’s net asset value (the excess of assets over liabilities). We use the notion of an equilibrium strategy and derive the HJB equation for our time-inconsistent optimization problem. We assume that the insurer’s risk aversion coefficient consists of a constant risk aversion and a small amount of a wealth-dependent risk aversion. Using perturbation theory, the equilibrium value function, which solves the HJB equation, is expanded on the parameter controlling the degree of risk aversion depending on wealth. We find the first-order approximations to the equilibrium value function and the equilibrium investment strategy. Some new results for exponential utility maximization problem with constant risk aversion are derived in order to approximate the solution to our exponential utility maximization problem with wealth-dependent risk aversion.


Introduction
Optimal investment problems are extensively studied in financial mathematics and the key example is exponential utility maximization problem. Among many papers in this field, we can mention the works by Hu et al. (2005), Morlais (2009), Ankirchner et al. (2010), Lim and Quenez (2011), Jiao et al. (2013) and Jeanblanc et al. (2015). In these papers the authors consider dynamic investment problems for an agent who valuates his B Łukasz Delong lukasz.delong@sgh.waw.pl 1 Department of Probabilistic Methods, Collegium of Economic Analysis, Warsaw School of Economics SGH, Niepodległości 162, Warsaw 02-554, Poland terminal wealth with exponential utility with absolute risk aversion coefficient which is constant in time. However, when deciding on dynamic asset allocation, it seems more reasonable to assume that the investor's risk preferences are time-varying.
The motivation for considering time-varying and stochastic risk preferences is clear. In a bull market investors are willing to take more risk, which should be modeled with a lower risk aversion coefficient, whereas in a bear market investors are willing to take less risk, which should be modeled with a higher risk aversion coefficient. Hence, the coefficient of risk aversion depending on the state of economy should be used in dynamic portfolio selection problems. Pirvu and Zhang (2013) and Kwak et al. (2014) study exponential utility indifference pricing and optimal investment strategies under exponential utility with regime-switching risk aversion coefficient. Gordon and St-Amour (2000) show that a state-dependent risk aversion can explain asset price movements which cannot be explained by constant risk aversion. There is also a strong empirical evidence that the degree of risk aversion depends on prior gains and losses, or on the available wealth in general. Thaler and Johnson (1990) claim that after a gain on a prior gamble people are more risk seeking than usual, while after a prior loss they become more risk averse. The observation that the risk aversion goes down after a prior gain is called the "house money" effect.
We investigate an exponential utility maximization problem for an insurer who faces a stream of non-hedgeable claims. The policyholders are entitled to annuity, life insurance and endowment benefits. The benefits are contingent on a non-tradeable financial index correlated with a stock available for trading in the financial market. The deaths of the policyholders and the benefits' occurrence times are modelled with a counting process. We assume that the insurer's risk aversion coefficient changes in time and its value depends on the current insurer's net asset value (the excess of assets over liabilities). If the assets are above the liabilities, then the insurer is less risk averse and is willing to implement more risky investment strategy. If the assets are below the liabilities, the insurer is more risk averse and switches to more conservative investment strategies. Hence, we take into account the "house money" effect when the insurer solves his asset allocation problem. To the best of our knowledge, there is only one paper (by Dong and Sircar 2014) which studies exponential utility maximization for investor with wealth-dependent risk aversion. At the same time we can find papers in which mean-variance optimization problems with wealth-dependent risk aversion coefficients are considered, see e.g. Zeng and Li (2011),  and Kronborg and Steffensen (2015).
It is known that exponential utility maximization problems with time-varying risk aversion coefficient are time-inconsistent and classical techniques of stochastic control cannot be applied. We follow the game-theoretic approach from Ekeland and Lazrak (2006), Ekeland and Pirvu (2008),  and Björk et al. (2017) and we derive the HJB equation for our time-inconsistent optimization problem with wealth-dependent risk aversion. The HJB equation characterizes the so-called equilibrium investment strategy and the equilibrium value function. In order to solve our HJB equation, we use the expansion techniques from Fouque et al. (2011), Fouque et al. (2014), Fouque and Hu (2017),  and Dong and Sircar (2014). We assume that the insurer's risk aversion coefficient consists of a constant risk aversion and a small amount of a wealth-dependent risk aversion. We apply perturbation theory and expand the solution to the HJB equation on the parameter controlling the degree of risk aversion depending on wealth. In the first step, we investigate an exponential utility maximization problem for an insurer with constant risk aversion coefficient and we derive some new results for exponential utility maximization problem with constant risk aversion. In particular, we investigate derivative of the value function with respect to risk aversion coefficient. We show existence of solutions to systems of nonlinear BSDEs and nonlinear PDEs which describe the value function for our exponential utility maximization problem with constant risk aversion and the derivative of the value function with respect to risk aversion. We show that the PDEs have smooth solutions. Finally, we use these results to postulate the first-order approximation to the solution to our HJB equation. We derive the first-order approximations to the equilibrium value function and the equilibrium investment strategy. Our first-order approximation to the equilibrium investment strategy is new and agrees with intuition. Dong and Sircar (2014) investigate time-inconsistent optimization problems, including an indifference pricing problem for a terminal claim under exponential utility with wealth-dependent risk aversion coefficient. They also assume that a small amount of wealth-dependent risk aversion is added to constant risk aversion and apply perturbation theory to find the first-order approximation to the solution to their HJB equation. Our model and results are much more general than the model and results from Dong and Sircar (2014). We consider an insurance portfolio where the run-off is modelled with a counting process and the insurer is exposed to a stream of nonhedgeable claims of three different types. Since we consider an insurance portfolio with an arbitrary number of policies, we study a recursive system of HJB equations. The results presented in Dong and Sircar (2014) are heuristic and in a summary form, whereas we present formal proofs of our results. We use not only PDEs but also BSDEs to characterize the first-order approximation to the solution. Finally, Dong and Sircar (2014) are only interested in the exponential utility indifference price of a terminal claim and they do not give the first-order equilibrium investment strategy for their problem.
The remainder of the paper is organized as follows. Sections 2 and 3 describe the model and the optimization problem. In Sect. 4 we recall perturbation theory and explain the idea behind the (asymptotic) first-order approximation to a solution to a problem. In Sect. 5 we investigate an exponential utility maximization problem with constant risk aversion coefficient whereas in the subsequent Sect. 6 we study an exponential utility maximization problem with wealth-dependent risk aversion. Section 7 contains some examples which illustrate our key result from Sect. 6. All proofs are presented in Sect. 8.

The financial and insurance model
We deal with a probability space (Ω, F, P) with a filtration F = (F t ) 0≤t≤T and a finite time horizon T < ∞. On the probability space (Ω, F, P) we define a standard two-dimensional Brownian motion (W , B) = (W (t), B(t), 0 ≤ t ≤ T ) and a càdlàg (right-continuous with left limits) counting process N = (N (t), 0 ≤ t ≤ T ). The uncorrelated Brownian motions (W , B) are used to model the financial risk and 76 Ł. Delong B(u), u ∈ [0, t]) contains information on the evolution of the financial indices. The counting process N is used to model the insurance risk and contains information on the number of in-force policies in the insurance portfolio. We assume that Under assumption (A1), the financial risk is independent of the insurance risk. As far as the filtration F is concerned, we use the standard approach of progressive enlargement of the Brownian filtration. The filtration F is right-continuous and completed with sets of measure zero.
The financial market consists of a risk-free deposit D = (D(t), 0 ≤ t ≤ T ) and two risky indices: S = (S(t), 0 ≤ t ≤ T ) and P = (P(t), 0 ≤ t ≤ T ). The value of the risk-free deposit is constant: i.e. we assume that the interest rate is zero or we consider discounted quantities in our problem. The prices of the risky indices are modelled with correlated Brownian motions. We assume that the prices of S and P satisfy the dynamics where μ, a, σ, b are positive constants which denote drifts and volatilities and ρ ∈ [−1, 1] denotes the correlation coefficient between the log-returns of S and P. The insurance company can invest in the deposit D and in the index S. The index P is not available for trading. The index P is the underlying investment fund for the insurance contracts sold by the insurance company. We use two indices in our model since in practice equity-linked life insurance contracts may be contingent on non-tradeable indices.
The insurance company keeps a homogeneous portfolio of n unit-linked policies. The counting process N is used to count the number of deaths in the insurance portfolio. We assume that the lifetimes of the policyholders are independent and exponentially distributed, i.e. we assume that Parameter λ denotes the mortality intensity for the policyholders. Since mortality intensity depends on age, we should assume that λ depends on time t. Such a modification of (A2) can be easily introduced. However, we keep (A2) to simplify the presentation of our results. Let count the number of policies in force in the insurance portfolios.
The insurer faces a stream of non-hedgeable claims which is modelled with the process C = (C(t), 0 ≤ t ≤ T ). The process C is described with the equation (2.4) Each policyholder in the insurance portfolio is entitled to three types of benefits: annuity α paid as long as the policyholder lives, life insurance benefit β paid if the policyholder dies and endowment benefit η paid if the policyholder survives till the terminal time T . The benefits α, β and η are contingent on the value of the index P. We assume that (A3) the functions α, β, η : (0, ∞) → [0, ∞) are bounded and Lipschitz continuous.
In order to fulfill the future obligations, the insurer must hold a reserve. The reserve is set for the policies in force. We define the reserve: whereQ denotes a pricing measure for C. Here, by reserve we mean an amount of money which the insurer sets aside to cover the future benefits. The insurer can choose any pricing measure to calculate the reserve (2.5). We don't make any assumptions on the pricing measureQ in (2.5). However, we assume that In the sequel, the reserve for one policy in force F 1 is simply denoted by F. If the counting process N is independent of (S, P) under the pricing measureQ and the prices of the pay-offs α, β, η are smooth functions of time and the underlying index P, then (A4) is satisfied.

78
Ł. Delong where x > 0 denotes the initial wealth. We assume that the survival benefits η are subtracted from X π (T ) at the terminal time T . In this paper we study the optimization problem: where Γ denotes a time-varying risk aversion coefficient which value at time t depends on the process The process R is interpreted as the insurer's net asset value -the excess of the insurer's assets over his liabilities. By the liability we mean the value of the reserve (2.5). The dynamics of the net asset value process R is given by the equation We assume that the risk aversion coefficient in (3.2) satisfies: (A5) Γ : R → (0, ∞) is bounded, decreasing, Lipschitz continuous and C 2 (R).
The motivation for considering the wealth-dependent risk aversion in the optimization problem (3.2) is the following. At different points in time, the insurer is likely to have different exponential utilities which are characterized with different risk aversion coefficients. We expect that the insurer's risk aversion coefficient should change in time and the dynamics of the risk aversion coefficient should be modelled with an adapted process related to some observable factors. It is very reasonable to assume that the risk aversion coefficient and the willingness to take the financial risk depend the financial position of the investor. We assume that the value of the insurer's risk aversion coefficient at time t depends on the current insurer's net asset value. If the assets are above the liabilities, then the insurer is less risk averse and is willing to implement more risky investment strategies. If the assets are below the liabilities, the insurer is more risk averse and switches to more conservative investment strategies. Hence, the risk aversion coefficient Γ should be a decreasing function of the net asset value.
Let us introduce the set of admissible investment strategies for (3.2).
The above definition of admissible investment strategies is standard for exponential utility maximization problems, see e.g. Hu et al. (2005) and Jeanblanc et al. (2015), except for point 4 where we require that the expected utility of the terminal wealth exists for all risk aversion coefficients defined by Γ . However, this requirement is clear since we aim at solving an exponential utility optimization problem with risk aversion coefficient which changes in time. Let us remark that points 2, 4 and boundedness of η imply that the family {e −Γ (r )X π (T ) , T is an F − stopping time} is uniformly integrable for π ∈ A and r ∈ R, which is often used in the definition of an admissible strategy instead of points 2 and 4, see Remark 8 in Hu et al. (2005). From financial point of view, points 2 and 4 of Definition 2.1 or the uniform integrability of {e −Γ (r )X π (T ) , T is an F − stopping time} exclude arbitrage investment strategies from considerations, see Remark 2 in Hu et al. (2005). The assumption of uniform integrability is slightly weaker than the other common assumption that the wealth process should be bounded from below, which is used to introduce so-called tame arbitrage-free strategies as admissible strategies, see Definition 3 in Levental and Skorohod (1995). Tame strategies limit borrowing and prevent doubling strategies.
The optimization problem (3.2) is an exponential utility maximization problem for an investor with wealth-dependent risk aversion coefficient Γ . We can define the objective function for (3.2): The objective function (3.3) is well-defined for any π ∈ A by point 4 of Definition 2.1. However, the optimization problem (3.2) is time-inconsistent and the Bellman's principle of optimality cannot be used to find the optimal strategy and the optimal value defined by sup π ∈A v k,π (t, x, p). We use the game-theoretic approach developed by Ekeland and Lazrak (2006), Ekeland and Pirvu (2008), Björk et al. (2017) and . In order to find the solution to (3.2), we consider a game played by a continuum of players with different utilities where the player at time t has its own risk aversion coefficient and only chooses the strategy at time t. We look for the sub-game perfect Nash equilibrium in the game with the reward given by (3.3).
Definition 3.2 Let us consider an admissible strategy π * ∈ A. Fix an arbitrary point (t, x, p, k) ∈ [0, T )×R×R×{0, 1, . . . , n} and choose an admissible strategy π ∈ A. For δ > 0 we define a new admissible strategy . . , n} and π ∈ A, then π * is called an equilibrium strategy and v k,π * is called the equilibrium value function corresponding to the equilibrium strategy π * .
In order to characterize the equilibrium value function and the equilibrium strategy with an HJB equation, we need to introduce the second function: (3.4) The function w k gives the value of the objective (3.3) for the optimization problem with the risk aversion depending on an auxiliary parameter r . The function w k describes the time-consistent part of the time-inconsistent optimization problem. Under the gametheoretic approach, the agent at time t forms a coalition for an infinitesimal time period and solves a time-consistent exponential utility maximization problem with a constant risk aversion coefficient over the infinitesimal time period, see Remark 2.3 in . The value function for this optimization problem at time t is determined by w k (t, x, p, r ) where r = x − k F(t, p). However, the evolution of w k (t, x, p, r ) cannot characterize the dynamics of the value function of the timeinconsistent optimization problem with time-varying risk aversion since the variable r is held fixed in the definition of w k . Hence, we need the function v k and its dynamics to fully characterize the equilibrium strategy and the equilibrium value function of the exponential utility maximization problem with time-varying risk aversion.
We finish with section by presenting the HJB equation and a verification theorem for our time-inconsistent optimization problem (3.2). First, we introduce operators associated with the continuous parts of (X π , P, R).

Definition 3.3 Let L π
k and M π k denote second order differential operators given by The operators L π k and M π k are defined, respectively, for x, p, r ) only acts on (t, x, p) and r is kept as a constant.

Perturbation theory and first-order approximations
It is known that it is hard to solve HJB equations for time-inconsistent optimization problems, see Ekeland and Lazrak (2006), Ekeland and Pirvu (2008), Björk et al. (2017), Ekeland et al. (2012) and Dong and Sircar (2014). In particular, we are not able to solve our HJB equations (3.5)-(3.6) since standard separation methods cannot be applied and we cannot split the variables in v k and w k . We use perturbation theory to approximate the solutions to the HJB equations (3.5)-(3.6).
Perturbation theory deals with finding an approximate solution to a problem by starting from the exact solution of a related, simpler problem. Perturbation theory can be applied if our problem can be formulated by adding a small term to some parameter of the exactly solvable problem. The solution to the main problem is next expanded in powers of this small parameter. The zeroth-order term in the expansion is the exact solution to the simpler problem and the higher order terms in the expansion describe deviations in the solution to the main problem from the solution of the simpler problem. Since the perturbation technique is based on adding a small parameter, we can truncate the series expansion of the solution to the main problem and keep the first two terms of the expansion as the first-order approximate solution. In financial applications, perturbation theory was developed by Fouque et al. (2011Fouque et al. ( , 2014 and Fouque and Hu (2017).
It is clear that our exponential utility maximization problem with wealth-dependent risk aversion can be related to a simpler exponential utility maximization problem with constant risk aversion. In order to apply the perturbation theory to solve the optimization problem (3.2), we consider a special structure of the wealth-dependent risk aversion coefficient Γ . We choose (4.1) We now assume that the insurer's risk aversion coefficient Γ consists of a constant risk aversion γ 0 > 0 and a small amount > 0 of wealth-dependent risk aversion γ 1 . We impose the technical condition: (A6) The function γ 1 : R → R is bounded, decreasing, Lipschitz continuous and The assumption that γ 1 (0) = 0 is a normalizing assumption for the risk aversion coefficient. We note that if r > 0 then Γ (r ) < γ 0 , if r > 0 then Γ (r ) > γ 0 . Since our risk aversion coefficient (4.1) consists of a constant risk aversion and a small amount of wealth-dependent risk aversion, we expect that the solution to the exponential utility maximization problem with the wealth-dependent risk aversion Γ (r ) = γ 0 + γ 1 (r ) should be expanded around the solution to the exponential utility maximization problem with the constant risk aversion γ 0 . In particular, the zeroth-order approximation to the equilibrium value function and the equilibrium strategy for the time-inconsistent exponential utility maximization problem (3.2) with the wealth-dependent risk aversion (4.1) should coincide with the value function and the optimal strategy for the time-consistent exponential utility maximization problem with the constant risk aversion γ 0 . Hence, in the next section we start with investigating the optimization problem (3.2) with Γ (r ) = γ 0 = γ . In Sect. 5 we study some properties of the zeroth-order solution which allows us in Sect. 6 to derive the first-order correction resulting from adding a small amount of wealth-dependent risk aversion to constant risk aversion.
The goal of this paper is to establish the first-order approximations to the equilibrium value function and the equilibrium strategy for the optimization problem (3.2) in the case of risk aversion Γ (r ) = γ 0 +γ 1 (r ) with small > 0. Formally, we are interested in finding functions where (v k , π * ,k ) n k=0 solve the system of the HJB equations (3.5)-(3.6) with the risk aversion Γ (r ) are the first-order approximations to v k (t, x, p) and π * ,k (t, x, p) for small . More precisely, the functions (4.4) which satisfy (4.2)-(4.3) are called the asymptotic firstorder approximations to v k (t, x, p) and π * ,k (t, x, p) as → 0. The error of the approximation in (4.2), or (4.3), is of a higher order than the approximating function and it is controlled with a function of order O( 2 ), see Definitions 1.1 and 2.1 in Holmes (2013). Let us recall that for some 0 > 0, where K is independent of but may depend on (x, 0 ). For details on perturbation theory we refer e.g. to Holmes (2013). In order to clarify the idea behind finding the asymptotic first-order approximation to a solution of an equation, we present a simple example from Chapter 1.5 in Holmes (2013). Let use consider the equation We postulate that the solution to (4.5) has the asymptotic expansion We substitute the expansion to (4.5) and collect the terms of order O(1), O( ), O( 2 ): We choose x 0 and x 1 so that the terms of orders O(1), O( ) are zero. We find The solutionx = ±1 − is the first-order approximation to the true solution x = ± √ 2 + 1− of Eq. (4.5) with small , or the asymptotic first-order approximation as → 0, since (4.2) hold. In other words, the error of approximating x = ± √ 2 + 1− withx = ±1 − is a function of order O( 2 ) as → 0. We can note that the firstorder approximation to the solution to (4.5) results from expanding the true solution around the exact solution to (4.5) with = 0. We will use the same reasoning in Sect. 6 where we postulate the asymptotic first-order approximation to the solution to our optimization problem (3.2) with the risk aversion coefficient (4.1). We remark that, by construction of the approximate solution inspired by perturbation theory, we only consider the wealth-dependent risk aversion coefficient Γ (r ) = γ 0 + γ 1 (r ) with small > 0.

The optimization problem with constant risk aversion coefficient
Since we expect that the zeroth-order approximation to the solutions to the HJB equations (3.5)-(3.6) are given with the solution to the exponential utility maximization problem with constant risk aversion, we start with investigating the optimization prob- First, let us introduce some spaces and their norms. Let G be some fil- We define the objective function and the value function for the exponential utility maximization problem with constant risk aversion: It is known that the solution to the optimization problem (5.2) can be characterized with solutions to BSDEs or PDEs.
Let us study the system of BSDEs: Alternatively, we can consider the system of PDEs: where (Y k , Z k 1 , Z k 2 ) n k=0 are defined in point (i) of Proposition 5.1. The optimal solution to (5.2) is characterized in the following theorem.
is the optimal admissible investment strategy for the optimization problem = p is the value function corresponding to the strategy π * . Alternatively, we can characterize the optimal strategy (5.7) with the functions (h k ) n k=0 from Proposition 5.2.
Expansions in perturbation theory are often justified by recalling Taylor's theorem and expanding the function in powers of small parameter . This implies that the term of order O( ) in the expansion is related to the first derivative of the function with respect to the parameter which is perturbated by adding . The value function from Theorem 5.1 depends on the risk aversion coefficient γ , in particular the solutions (Y k ) n k=0 and (h k ) n k=0 depend on γ . Consequently, our next step is to investigate the derivative of the process Y k , and the derivative of the function h k , with respect to risk aversion coefficient γ . The following propositions are crucial for establishing the first-order correction in the expansion of the equilibrium value function.
Let us introduce the system of BSDEs: The last result of this section establishes the relation between the solutions to the BSDEs (5.8) and solutions to PDEs. We investigate the system of PDEs: where (h k ) n k=0 are defined in Proposition 5.2. We need to impose an additional smoothness condition for the functions (h k ) n k=0 in order to guarantee smooth solutions (g k ) n k=0 to (5.10). We assume that (A7) There exist mixed derivatives (h k t p ) n k=0 ∈ C([0, T ) × (0, ∞)). Assumption (A7) is not needed if ρ 2 = 1, e.g. when the benefits α, β, η are contingent on the tradeable risky asset S.
where (Y k , Z k 1 , Z k 2 ) n k=0 are defined in Proposition 5.3.

The optimization problem with wealth-dependent risk aversion coefficient
In the view of the discussion from Sect. 4, we postulate the following first-order expansions: We also assume that derivatives of (v k ) n k=0 , (w k ) n k=0 satisfy the first-order expansions of the same form (6.1)-(6.2), see Chapter 1.4.3 in Holmes (2013).
From Eq. (3.5) we deduce that the true equilibrium strategy takes the form (6.4) We remark that w r in (6.4) denotes derivative of w with respect to r valued at r = x − k F(t, p). If the first-order expansions (6.1)-(6.2) for the functions (v k ) n k=0 , (w k ) n k=0 and their derivatives are substituted into the equilibrium strategy (6.4), then we can confirm the first-order expansion for the equilibrium strategy (6.3). In the expansion (6.3) we have to use .
We can now state our main result.

Remark
In this paper we have not formally confirmed the order of the approximation error in (6.1)-(6.3), see Sect. 4 for the definition of the asymptotic first-order approximation. Hence, the strategy (6.15) is only a candidate asymptotic first-order approximation to the equilibrium investment strategy. We remark that only the order of the approximation error have not been proved, whereas the first-order approximations have been justified and formally derived on the grounds of perturbation theory, the discussion in Sect. 4 and the calculations in this section. In Delong (2018b) we study an asymptotic optimality of our investment strategy and we formally show that (6.15) performs better than any strategy in the class π 0 (t) + π 1 (t) up to the second order O( 2 ) in the asymptotic expansion of the value function as → 0. We refer the reader to Delong (2018b).
Our investment strategy (6.15) agrees with intuition. The zeroth-order strategy, i.e. the first term in (6.15), is the optimal investment strategy for the insurer with constant risk aversion γ 0 who aims at maximizing the expected exponential utility of the terminal wealth. The zeroth-order strategy consists of the constant Merton strategy and the hedging strategy for the claims, which are optimal if the constant risk aversion γ 0 is used over the whole investment period. Since the insurer uses the risk aversion coefficient Γ consisting of the constant risk aversion γ 0 and the wealth-dependent risk aversion γ 1 , the insurer should adjust the strategy and allow for the time-varying risk aversion. The first-order correction, the second term in (6.15), describes the firstorder change in the zeroth-order strategy if the constant risk aversion coefficient γ 0 is modified by adding a small amount of the wealth-dependent component γ 1 . The Merton strategy and the hedging strategy, which are optimal for the constant risk aversion γ 0 , are both adjusted in (6.15) to reflect changes in the risk aversion coefficient and they now take into account the new value of the insurer's wealth-dependent risk aversion Γ at a given time.

Examples
In this section we illustrate Theorem 6.1 with examples. We investigate the BSDEs (5.3), (5.8) and the investment strategy (6.15) in some special cases relevant for insurance and financial applications.
Example 1 Let us assume that the insurer is not exposed to insurance risk and has no liability. Hence, in this example we consider a pure investment problem for an investor with the wealth-dependent risk aversion (4.1). We expect that the equilibrium strategy is related to the Merton strategy. It is easy to see that we can set Z γ 1 (t) = 0 and Z γ 1 (t) = 0 in (5.3), (5.8). The first-order approximation to the equilibrium strategy is We end up with the Merton strategy with the constant risk aversion γ 0 which is adjusted with a wealth-dependent term when the insurer's wealth-dependent risk aversion deviates from γ 0 . One may wonder ifπ is the true equilibrium strategy for our time-inconsistent pure investment problem, sinceπ * (t) = μ The answer is no. The strategy (7.2) is called a naive strategy. For simplicity of presentation, let us slightly move away from the model considered in this paper and assume that Γ (r ) = γ 0 /r . The wealth process (3.1) under the strategy (7.2) takes the form d Xπ

From (3.3)-(3.4) and (7.3), we can conclude that
where ξ t,T is a random variable with log-normal law. If the strategy (7.2) were the true equilibrium strategy, then (7.4) would be the functions which satisfy the HJB equation (3.5). In particular, from (6.4) we could recalculate the equilibrium strategy. We get which does not coincide with the strategy assumed in (7.2). Summing up, the firstorder approximation (7.1) to the equilibrium strategy agrees with our intuition, but the naive strategy (7.2) is not the equilibrium strategy for our investment problem. A numerical comparison of the true equilibrium strategy and the naive strategy is presented in Delong (2018a).
Example 2 Let us assume that the insurer is not exposed to insurance risk but has a terminal liability η. Clearly, the Merton strategy must be complemented with a hedging strategy for η. We assume that the market is complete, i.e the liability η is contingent on the index P which coincides with the tradeable index S. We can set Z γ 1 (t) = 0 in (5.8), but we cannot set Z γ 1 (t) = 0 in (5.3). Fortunately, we can set Z γ 2 (t) = 0 in (5.3), and we end up with the linear BSDE: The solution to (7.6) can be derived by classical techniques, see e.g. Proposition 3.3.1 in Delong (2013). The solution Z γ 1 to (7.6) gives us the hedging strategy for η which should be applied by the insurer with the constant risk aversion γ . However, the process Z γ 1 does not depend on the risk aversion coefficient γ . The independence of the hedging strategy of the risk aversion is due to market completeness as the liability η can be perfectly hedged. Consequently, the insurer does not have to modify the hedging strategy when his risk aversion changes. The first-order approximation to the equilibrium strategy iŝ The strategy consists of the Merton strategy and the hedging strategy for η, but only the Merton strategy with the constant risk aversion γ 0 is adjusted with a wealth-dependent term as the insurer's wealth-dependent risk aversion varies in time.

Example 3
In this example we assume that the insurer is exposed to a terminal liability η which is paid if the policyholder survives. We assume that the liability η is contingent on the index P which coincides with the tradeable index S. Since the market is incomplete due to insurance risk, the hedging strategy for η now depends on the insurer's risk aversion coefficient and should be updated when the risk aversion changes. In this example we have to solve both (5.3) and (5.8). We can set Z γ 2 (t) = 0 and Z γ 2 (t) = 0. We deal with two BSDEs: where Y 0,γ (t) = − μ 2 2σ 2 γ (T − t) and Y 0,γ (t) = μ 2 2σ 2 γ 2 (T − t). The first-order approximation to the equilibrium strategy, applied if the policyholder lives, iŝ The Merton strategy and the hedging strategy for η, which are optimal for the constant risk aversion γ 0 , are both adjusted with wealth-dependent and liability-dependent terms as the insurer's wealth-dependent risk aversion varies in time. If the policyholder dies, then the strategy from Example 2 is applied. The BSDE (7.8) is a linear equation and we can give a probabilistic representation of the solution, see Proposition 3.3.1 in Delong (2013). The solution to the BSDE (7.7) is investigated in Moore and Young (2003) and Ankirchner et al. (2010) in the context of different optimization problems.

Example 4
Finally, let us assume that the insurer is not exposed to insurance risk but has a terminal liability η which is contingent on the non-tradeable index P correlated with the tradeable index S. The market is incomplete due to non-hedgeable financial risk. As in the previous example, the hedging strategy for η depends on the insurer's risk aversion coefficient. We have to solve both (5.3) and (5.8), and we cannot set Z γ 2 (t) = 0, Z γ 2 (t) = 0. We consider two BSDEs: (7.11) and the first-order approximation to the equilibrium strategy takes the form Again, the Merton strategy and the hedging strategy for η, which are optimal for the constant risk aversion γ 0 , are both adjusted with wealth-dependent and liabilitydependent terms as the insurer's wealth-dependent risk aversion varies in time.

Proof of Theorem 3.1
The proof is standard and we refer to the proof of Theorem 5.2 from Björk et al. (2017).

Proof of Proposition 5.2 Assertion (i):
If |ρ| = 1, then we deal with the PDEs: If |ρ| < 1, then we introduceh k (t, p) = e (1−ρ 2 )γ h k (t, p) and we deal with the PDEs: (8.8) For k = 0 we immediately get h 0 (t, p) = − μ 2 2σ 2 γ (T − t) and uniqueness of solution follows from Proposition 2.3 from Becherer (2005). Equation (8.7): The result follows from Propositions 2.1 and 2.3 from Becherer (2005), which should be applied iteratively to the PDEs (8.7) starting with k = 1 and h 0 . Fix k ∈ {1, . . . , n} and h k−1 is given. Assume that h k−1 is uniformly bounded on , which is the case for h 0 . We define an operator based on Feynman-Kac formula and Proposition 2.1 from Becherer (2005): p) is uniformly bounded from below in (t, p, m), it is also easy to see that h k m+1 (t, p) is uniformly bounded from above in (t, p, m). Hence, the assumptions of Proposition 2.1 from Becherer (2005) are satisfied. We conclude that there exists a unique fixed point of the operator A and a unique solution h k to the equation h k (t, p) = (Ah k )(t, p), which can be derived from (h k m ) ∞ m=0 . Next, we use Proposition 2.3 from Becherer (2005) to show that the fixed point h k is a smooth function and satisfies the PDE (8.7). We investigate smoothness properties of the successive elements in the sequence h k m+1 (t, p) = (Ah k m )(t, p). Assumptions (2.9)-(2.12) from Becherer (2005) are satisfied, but (2.13) is not clear. However, a closer look at the proof [see (2.16)] shows that it is sufficient to require that where > 0, D is a bounded subset of (0, ∞) such thatD ⊂ (0, ∞), and K l , K u denotes the lower and upper bounds for the sequence , this assumption holds in our case. Hence, from Proposition 2.3 in Becherer (2005)  We note that ifh k Sinceh k m (t, p) is positive and uniformly bounded away from zero in (t, p, m), it is also easy to see thath k m+1 (t, p) = (Ah k m )(t, p) is uniformly bounded from above in (t, p, m). Hence, the assumptions of Propositions 2.1 and 2.3 from Becherer (2005) are satisfied. Assertion (ii): The case with k = 0 is trivial -just compare the explicit solutions to the BSDE and the PDE for k = 0. Fix k ∈ {1, . . . , n}. Assume that Y k−1 (t) = h k−1 (t, P(t)), which is the case for k = 0. Since we have a sequence of smooth functions (h k ) n k=0 , we can apply Itô's formula to derive the dynamics of h k (t, P(t)) on [0, T − ] and compare the resulting dynamics with the dynamics of Y k given by (5.3) [this step is standard, see e.g. Proposition 4.3 in El Karoui et al. (1997)]. We can deduce candidate solutions for (Y k , Z k 1 , Z k 2 ) on [0, T ]. Next, we have to prove that the candidate solutions (5.6) are in the appropriate class of processes. The candidate solution for Y k is bounded by point (i). We prove the BMO property for the candidate solutions for (Z k 1 , Z k 2 ). Let us choose a localizing sequence of stopping times (τ m ) ∞ m=1 for the process P and a stopping time τ ∈ [0, T ]. Applying Itô's formula to h k , changing the measure to Q ∼ P with the exponential martingale E − · 0 μ σ dW (s) and using the PDE (5.5), we can derive If |ρ| = 1, we take the square on both sides of (8.9) and the expected value. If |ρ| < 1, we just take the expected value. In both cases, by boundedness of (h k ) n k=0 , α, β, we can establish the inequality Taking m → ∞, → 0, by monotone convergence theorem we deduce that  Kazamaki (1997)]. By uniqueness of solution to the BSDE (5.3), we have characterized (Y k , Z k 1 , Z k 2 ) with h k . Proof of Theorem 5. 1 Step 1: Let us assume there exists a unique solution (Y , Z 1 , (8.10) such that (Y , Q) are bounded and Using standard techniques from optimal control, see e.g. Hu et al. (2005) or Chapter 11 in Delong (2013), we can prove that the strategy is the optimal admissible investment strategy for the optimization problem (3.2) and V k (t, x, p) = V k,π * (t, x, p) = −e −γ x e γ Y (t) | P(t)= p,J (t)=k is the value function corresponding to the strategy π * . Moreover, where E(M) denotes the stochastic exponential of the martingale M. Since t 0 Z 2 (s) d B(s) is a B M O-martingale, β and Q are bounded and the process N only jumps finitely many times upward, we can conclude that the product of the stochastic exponentials of martingales in (8.12) is a true martingale, see Lemma 1 in Morlais (2010) and Theorem 2.3 in Kazamaki (1997).
Step 2: We prove that there exists a solution to the BSDE (8.10), which we assume in Step 1. The BSDE (8.10) is a quadratic-exponential BSDE with jumps. Jeanblanc et al. (2015),  and Jiao et al. (2013) showed how to transform a quadratic-exponential BSDE with a finite number of jumps into a system of BSDEs without jumps. We apply their methods. Let τ n = 0, τ k = inf{t > τ k+1 : . . , n}, let us write the BSDE (8.10) on τ k ≤ t ≤ τ k−1 , where we assume that τ −1 = T . We get the equation: The F τ k−1 -measurable random variable Y (τ k−1 ) has the decomposition: . . . , n, (8.14) whereỸ is an F W ,B -adapted process. In order to match the terminal condition of the BSDE (8.13), given by (8.14), at the jump time τ k−1 < T , we set Consequently, the problem of solving the BSDE (8.10) can be replaced with the problem of solving the system of BSDEs ( Jeanblanc et al. (2015). We set The optimal strategy (5.7) follows from (8.11) and (8.15).
Step 3: We investigate properties of the solution (8.15). By uniqueness of solutions to the BSDEs (5.3) and the arguments from Step 2, there exists a unique solution

Proof of Proposition 5.3
Step 1: We will apply the a priori estimates from Ankirchner et al. (2007) which we adapt to our setting. We will often use the properties of the solutions to the BSDEs (5.3) which we specify in points (i)-(ii) in Proposition 5.1 (without recalling them). We will also use the energy inequality [see p. 29 in Kazamaki (1997) (F W ,B ). We choose ( , ) ∈ [− 0 , 0 ] and 0 < 0 < γ .

Ł. Delong
Step 2: We claim that the mapping γ → (Y k,γ , Z k,γ 2 ) can be directly investigated and the assertion holds for k = 0. We fix k ∈ {1, . . . , n} and we assume that the assertion holds for k − 1. We prove that the assertion holds for k. Let us introduce the function We remark that parameter γ in ψ k,γ also affects the process Y k−1,γ . The assumptions of Theorem 5.1 and Lemma 5.2 from Ankirchner et al. (2007) are satisfied. However, the quadratic term in our Eq. (5.3) is of the form γ (Z k,γ 2 ) 2 and both terms γ and Z k,γ 2 (t) are perturbated when we add to γ . If we write then we can observe that we have one additional term compared to Ankirchner et al. (2007). Adapting the proofs of Theorem 5.1 and Lemma 5.2 from Ankirchner et al. (2007) to our setting, we can derive the estimate where the constant K depends on q, T , the Lipschitz constant of (y, z 1 ) → ψ k,γ (t, y, z 1 ) and ||(γ + ) Z Taking → 0 and using the dominated convergence theorem, we can prove that the right hand side of (8.17) converges to zero. The (F W ,B ). Consequently, the assertion of Step 2 is proved.
(8.23) By (8.20)-(8.21) and Step 3.1, the norm ||U k, || R ∞ can be bounded by a constant independent of . Since γ → Y k,γ is continuous in R q (F W ,B ) by Step 2, we deduce that the right hand side of (8.23) converges to zero a.s. for a.a. t ∈ [0, T ], as ( , ) → 0. Consequently, by the dominated convergence theorem, the first term after the inequality in (8.22) converges to zero as ( , ) → 0. We are left with one more term in (8.22).
Step 4: The formulas for Z 1 and Z 2 can be proved as in Proposition 5.2.
Proof of Theorem 6.1 From the calculations in Sect. 6 we conclude that the first-order expansion to the equilibrium strategy is given by (6.3) with (6.11) and (6.14). If we use the relations between (h k ) n k=0 , (g k ) n k=0 and (Y k , Z k 1 ) n k=0 , (Y k , Z k 1 ) n k=0 established in Propositions 5.2 and 5.4 , we get the strategy (6.15). We now confirm that our strategy (6.15) is admissible, i.e. it satisfies all points of Definition 3.1.
Point 1: The strategyπ * is F-predictable and is determined with a measurable mapping.