A maximum principle for Markov regime-switching forward–backward stochastic differential games and applications

In this paper, we present an optimal control problem for stochastic differential games under Markov regime-switching forward–backward stochastic differential equations with jumps. First, we prove a sufficient maximum principle for nonzero-sum stochastic differential games problems and obtain equilibrium point for such games. Second, we prove an equivalent maximum principle for nonzero-sum stochastic differential games. The zero-sum stochastic differential games equivalent maximum principle is then obtained as a corollary. We apply the obtained results to study a problem of robust utility maximization under a relative entropy penalty and to find optimal investment of an insurance firm under model uncertainty.


Introduction
The expected utility theory can be seen as the theory of decision making under uncertainty based on some postulates of agent's preferences. In general, the agent's preference is driven by a time-additive functional and a constant rate discount future reward. The standard expected utility maximization problem supposes that the agent knows the initial probability measure that governs the dynamics of the underlying. However, it is difficult or even impossible to find an individual worthwhile probability distribution of the uncertainty. Moreover, in finance and insurance, there is no conformism on which original probability should be used to model uncertainty. This led to the study of utility maximization under model uncertainty, the uncertainty being represented by a family of absolute continuous (or equivalent) probability distributions. The idea is to solve the problem for each probability measure in the above mentioned class and choose the one that gives the worst objectives value. More specifically, the investor maximizes the expected utility with respect to each measure in this class, and chooses among all, the portfolio with the lowest value. This is also known as robust optimization problem and has been intensively studied in the past years. For more information, the reader may consult (Bordigoni et al. 2005;Elliott and Siu 2011;Faidi et al. 2011;Jeanblanc et al. 2012;Menoukeu-Pamen 2015;Øksendal and Sulem 2012) and references therein.
Our paper is motivated by the idea developed in Menoukeu-Pamen (2015), Menoukeu-Pamen (2014) and Øksendal and Sulem (2012) where general maximum principle for Forward-backward stochastic differential games with or without delay are presented. We give a general maximum principle for Forward-backwardMarkov regime-switching stochastic differential equations under model uncertainty. Then we study a problem of recursive utility maximization with entropy penalty. We show that the value function is the unique solution to a quadratic Markov regime-switching backward stochastic differential equation. This result extends the results in Bordigoni et al. (2005) and Jeanblanc et al. (2012) by considering a Markov regime-switching state process, and more general stochastic differential utility (SDU). The notion of SDU was introduced in Duffie and Epstein (1992) as a continuous time extension of the concept of recursive utility proposed in Epstein and Zin (1989) and Weil (1990). The latter notion was developed in order to untie the concepts of risk aversion and intertemporal substitution aversion which are not treated independently in the standard utility formulation.
The other motivation is to study stochastic differential games problem for Markovregime switching systems. In a financial market, one may assume that this correspond to the case in which the mean relative growth rate of the risky asset is not known to the agent, but subject to uncertainty, hence it is regarded as a stochastic control which plays against the agent, that is, a (zero-sum) stochastic differential games between the agent and the market. Similar problem was studied in Elliott and Siu (2011) where the objective of an insurance company is to choose an optimal investment strategy so as to maximize the expected exponential utility of terminal wealth in the worstcase scenario. The authors use the dynamic programming approach to derive explicit optimal investment of the company and optimal mean growth rate of the market when the interest rate is zero. In this paper, our general the stochastic maximum principle extends their results to the framework of (nonzero-sum) Forward-backward stochastic differential games and more general dynamics for the state process. In addition, when the company and the market have the same level of information, we obtain explicit forms for the optimal strategies of the market and the insurance company, when the Markov chain has two states and the interest rate is not zero. Let us mention that our general result can also be applied to study utility maximization under risk constraint under model uncertainty. This is due to the fact that risk measures can be written as a solution to a BSDE. Hence transforming the problem with constraint to the unconstrained one leads to the setting discussed here. Another application of our result pertains to risk minimization under model uncertainty in a regime-switching market.
The remaining of the paper is organized as follows: In Sect. 2, we formulate the control problem. In Sect. 3, we derive a partial information stochastic maximum principle for forward backward stochastic differential games for a Markov switching Lévy process under model uncertainty. In Sect. 4, we apply the results to study first a robust utility maximization with entropy penalty and second a problem of optimal investment of an insurance company under model uncertainty. In the latter case, explicit expressions for optimal strategies are derived.

Model and problem formulation
In this section, we formulate the general problem of stochastic differential games of Markov regime-switching Forward-backward SDEs. Let ( , F, P) be a complete probability space, where P is a reference probability measure. On this probability space, we assume that we are given a one dimensional Brownian motion B = {B(t)} 0≤t≤T , an irreducible homogeneous continuous-time, finite state space Markov chain α := {α(t)} 0≤t≤T and N (dζ, ds) a independent Poisson random measure on We suppose that the filtration F = {F t } 0≤t≤T is the P-augmented natural filtration generated by B, N and α [see for example Donnelly (2011, Section 2) or Elliott and Siu (2011, p. 369)].
We assume that the Markov chain takes values in a finite state space S = {e 1 , e 2 , . . . , e D } ⊂ R D , where D ∈ N, and the jth component of e n is the Kronecker delta δ nj for each n, j = 1, . . . , D. Denote by := {λ nj : 1 ≤ n, j ≤ D} the rate (or intensity) matrix of the Markov chain under P. Hence, for each 1 ≤ n, j ≤ D, λ nj is the constant transition intensity of the chain from state e n to state e j at time t. Recall that for n = j, λ nj ≥ 0 and D j=1 λ nj = 0, hence λ nn ≤ 0. As shown in Elliott et al. (1994), α admits the following semimartingale representation -martingale and T denotes the transpose of a matrix. Next we introduce the Markov jump martingale associated to α; for more information, the reader should consult Elliott et al. (1994) or Zhang et al. (2012). For each 1 ≤ n, j ≤ D, with n = j, and t ∈ [0, T ], denote by J nj (t) the number of jumps from state e n to state e j up to time t. It can be shown (see Elliott et al. 1994) that Fix j ∈ {1, 2, . . . , D}, denote by j (t) the number of jumps into state e j up to time t. Then with j (t) = D n=1,n = j m nj (t) and λ j (t) = D n=1,n = j λ nj t 0 α(s−), e n ds. Note that for each j ∈ {1, 2, . . . , D}, j := { j (t)} t∈[0,T ] is a (F, P)-martingale. Suppose that the compensator of N (dζ, ds) is given by where η(ds) is a σ -finite measure on R + and ν(dζ |s) := (ν e 1 (dζ |s), ν e 2 (dζ |s), . . . , ν e D (dζ |t)) ∈ R D is a function of s. Let mention that for each j = 1, . . . , D, ν e j (dζ |s) = ν j (dζ |s) represents the conditional Lévy density of jump sizes of N (dζ, ds) at time s when α(s − ) = e j and satisfies R 0 min(1, ζ 2 )ν j (dζ |s) < ∞.
Suppose that the state process X (t) = X (u) (t, ω); 0 ≤ t ≤ T, ω ∈ , is a controlled Markov regime-switching jump-diffusion process of the form (2.5) In financial market the above model enables to incorporate the impact of changes in macro-economic conditions on the behaviour of the dynamics of an asset's price as well as the occurrence of unpredictable events that could affect the price's dynamic. One could think of the Brownian motion part as the random shocks in the price of a risky asset. The Poisson jump part takes into account the jumps in the asset price caused by lack of information or unexpected events. The Markov chain enables to describe economic cycles. The states of the underlying Markov chain represent the different states of the economy whereas the jumps given by the martingale of the underlying Markov chain represent transitions in economic conditions.
In this paper, we consider the nonzero-sum stochastic differential games problem. This means that, one player's gain (respectively loss) does not necessarily end in the other player's loss (respectively gain). In our model, the control u = (u 1 , u 2 ) is such that u i is the control of player i; i = 1, 2. We suppose that the different levels of information available at time t to the player i; i = 1, 2 are modelled by two subfiltrations where δ ≥ 0 is a given constant delay. Denote by A i the set of admissible control of player i, contained in the set of E The functions b, σ, γ and η are given such that for all t, b(t, x, e n , u, ·), σ (t, x, e n , u, ·), γ (t, x, e n , u, ζ, ·) and η(t, x, e n , u, ·), n = 1, . . . , D are F tprogressively measurable for all x ∈ R, u ∈ A 1 × A 2 and ζ ∈ R 0 , b(·, x, e n , u, ω), σ (·, x, e n , u, ω). In addition, γ (·, x, e n , u, ζ, ω) and η(·, x, e n , u, ω), n = 1, . . . , D for each x ∈ R, u ∈ A 1 × A 2 , ζ ∈ R 0 , ζ ∈ R 0 and (2.5) has a unique strong solution for any admissible control u ∈ A 1 × A 2 . Under the above condition, existence and uniqueness of (2.5) is ensured if b, σ, γ and η are globally Lipschitz continuous in x and satisfy linear growth in x; see for example Applebaum (2009, Theorem 6.2.3), Mao and Yuan (2006, Theorem 3.13) and Kulinich and Kushnirenko (2014, Theorem).
For each player i, we consider the associated BSDE's in the unknowns (Y i (t), (2.7) Here g i : [0, T ]×R×S×R×R×R×R×A 1 ×A 2 → R and h : R×S → R are such that the BSDE (2.7) has a unique solution for any admissible control u ∈ A 1 × A 2 . For sufficient conditions for existence and uniqueness of Markov regime-switching BSDEs, we refer the reader to Cohen andElliott (2010, Theorem 1.1) or Crepey (2010, Proposition 14.4.1) or Tang and Li (1994, Lemma 2.4) and references therein. For example, such unique solution exists if one assumes that g (·, x, e i , y, z, k, v, u) is uniformly Lipschitz continuous with respect to x, y, z, k, v, the random variable h(X (T ), α(T )) is squared integrable and g(t, 0, e i , 0, 0, 0, 0, u) is uniformly bounded.
Let f i : [0, T ]×R×S×A 1 ×A 2 → R, ϕ i : R×S → R and ψ i : R → R, i = 1, 2 be given C 1 functions with respect to their arguments and ψ i (x) ≥ 0 for all x, i = 1, 2. For the nonzero-sum games, the control actions are not free and generate for each player i, i = 1, 2, a performance functional (2.8) Here, f i , ϕ i and ψ i may be seen as profit rates, bequest functions and "utility evaluations" respectively, of the player i; i = 1, 2. For t = 0, we put Let us note that in the nonzero-sum games the players do not share the same performance functional, instead, each of them uses his own performance functional. In addition, they all have the same objectives, that is, maximize their performance functional.
To be more precise, the nonzero-sum games is the following: for all u 2 ∈ A 2 . If it exists, we call such a pair (u * 1 , u * 2 ) a Nash Equilibrium. This intuitively means that while player I controls u 1 , player II controls u 2 . We assume that each player knows the equilibrium strategies of the other player and does not gain anything by changing his strategy unilaterally. If each player is making the best decision she can, based on the other player's decision, then we say that the two players are in Nash Equilibrium.

A stochastic maximum principle for Markov regime-switching forward-backward stochastic differential games
In this section, we derive the Nash equilibrium for Problem 2.1 based on a stochastic maximum principle for Markov regime-switching Forward-backward differential equation.
Define the Hamiltonians x, e n , y, z, k, v, u 1 , u 2 , a, p, q, r (·), w) where R denote the set of all functions k : [0, T ] × R 0 → R for which the integral in (3.1) converges. An example of such set is the set L 2 (ν α ). We suppose that H i , i = 1, 2 is Fréchet differentiable in the variables x, y, z, k, v, u and that ∇ k H i (t, ζ ), i = 1, 2 is a random measure which is absolutely continuous with respect to ν. Next, we define the associated adjoint process A i (t), p i (t), q i (t), r i (t, ·) and w i (t), t ∈ [0, T ] and ζ ∈ R by the following Forward-backward SDE

The Markovian regime-switching forward SDE in
Here and in what follows, we use the notation

A sufficient maximum principle
In what follows, we give the sufficient maximum principle.
Remark 3.2 In the above Theorem and in its proof, we have used the following shorthand notation: For i = 1, the processes corresponding to u = (u 1 ,û 2 ) are given for example by (t) and the processes correspond- (t). Similar notation is used for i = 2. The integrability condition (3.9) ensures the existence of the stochastic integrals while using Itô formula in the proof of the Theorem. • We say that F has a directional derivative (or Gateaux derivative) In this case we call L the Fréchet derivative of F at x, and we write

An equivalent maximum principle
The concavity condition on the Hamiltonians does not always hold on many applications. In this section, we shall prove an equivalent stochastic maximum principle which does not require this assumption. We shall assume the following: It follows from (2.5) and (2.7) that (3.13) We can obtain dX 2 (t) and dy 2 (t) in a similar way.
Remark 3.4 As for sufficient conditions for the existence and uniqueness of solutions (3.12) and (3.13), the reader may consult Peng (1993, Eq. 4.1) (in the case of diffusion state processes).
As an example, a set of sufficient conditions under which (3.12) and (3.13) admit a unique solution is as follows: 1. Assume that the coefficients b, σ, γ, η, g i , h i , f i , ψ i and φ i for i = 1, 2 are continuous with respect to their arguments and are continuously differentiable with respect to (x, y, z, k, v, u). (Here, the dependence of g i and f i on k is through Hence the differentiability in this argument is in the Fréchet sense.) 2. The derivatives of b, σ, γ, η with respect to x, u, the derivative of h i , i = 1, 2 with respect to x and the derivatives of g i , i = 1, 2 with respect to x, y, z, k, v, u are bounded. 3. The derivatives of f i , i = 1, 2 with respect to x, u are bounded by C(1+|x|+|u|). 4. The derivatives of ψ i and φ i with respect to x are bounded by C(1 + |x|).
We can state the following equivalent maximum principle: (3.12) and (3.13), respectively. Suppose that Assumptions A.1, A.2 and A.3 hold. Moreover, assume the following integrability conditions Then the following are equivalent: Proof See "Appendix".
Remark 3.6 The integrability conditions (3.14) and (3.15) guarantee the existence of the stochastic integrals while using Itô formula in the proof of the Theorem. Note also that the result is the same if we start from t ≥ 0 in the performance functional, hence extending Øksendal and Sulem (2012, Theorem 2.2) to the Markov regime-switching setting.

Zero-sum Game
In this section, we solve the zero-sum Markov regime-switching Forward-backward stochastic differential games problem (or worst case scenario optimal problem): that is, we assume that the performance functional for Player II is the negative of that of Player I, i.e., In this case (u * 1 , u * 2 ) is a Nash equilibrium iff ess sup On one hand (3.18) implies that ess inf u 1 , u 2 )).
The zero-sum Markov regime-switching Forward-backward stochastic differential games problem is therefore the following: Problem 3.7 Find u * 1 ∈ A 1 and u * 2 ∈ A 2 (if they exist) such that ess inf When it exists, a control (u * 1 , u * 2 ) satisfying (3.19), is called a saddle point. The actions of the players are opposite, more precisely, between player I and II there is a payoff J (t, u 1 , u 2 ) and it is a reward for Player I and cost for Player II. In the case of a zero-sum games, we only have one value function for the players and therefore, Theorem 3.1 becomes Theorem 3.9 (Sufficient maximum principle for Regime-switching FBSDE zero-sum games) Let ( u 1 , u 2 ) ∈ A 1 × A 2 with corresponding solutions X (t), ( Y (t), Z (t), K (t, ζ ), V (t)), A(t), ( p(t), q(t), r (t, ζ ), w(t)) of (2.5), (2.7), (3.2) and (3.3) respectively. Suppose that the following hold: 1. For each e n ∈ S, the functions  H (t, x, μ 1 , e n , y, z, k, v, μ 1 , u 2 (t), A, a.s. and H(t, x, e n , y, z, k, v) = ess inf for all t ∈ [0, T ], a.s. and for all t ∈ [0, T ], a.s. Herê 4. d dν ∇ k g(t, ξ) > −1. 5. In addition, the integrability condition (3.9) is satisfied for p i = p, etc.

Application to robust utility maximization with entropy penalty
In this section, we apply the results obtained in Sect. 3 to study an utility maximization problem under model uncertainty. We assume that E The framework is that of Bordigoni et al. (2005). For any Q ∈ ( , F T ), let be the relative entropy of Q with respect to P. We aim at finding a probability measure Q ∈ Q F that minimizes the functional with a 0 and a 0 being non-negative constants; κ = (κ(t)) 0≤t≤T a non-negative bounded and progressively measurable process; U 1 = (U 1 (t)) 0≤t≤T a progres- is the discount factor and R κ (t, T ) is the penalization term, representing the sum of the entropy rate and the terminal entropy, i.e.
with G Q = (G Q (t)) 0≤t≤T is the RCLL P-martingale representing the density of Q with respect to P, i.e.

Application to optimal investment of an insurance company under model uncertainty
In this section, we use our general framework to study a problem of optimal investment of an insurance company under model uncertainty. The uncertainty here is also described by a family of probability measures. Such problem was solved in Elliott and Siu (2011) using dynamic programming approach when the interest rate is 0. We show that the general maximum principle enables us to find the explicit optimal investment when r = 0. We restrict ourselves to the case E (1) T ], i = 1, 2 then the problem is non-Markovian and hence the dynamic programming used in Elliott and Siu (2011) cannot be applied.
The model is that of Elliott and Siu (2011, Section 2.1). Let ( , F, P) be a complete probability space with P representing a reference probability measure from which a family of real-world probability measures are generated. We shall suppose that ( , F, P) is big enough to take into account uncertainties coming from future insurance claims, fluctuation of financial prices and structural changes in economics conditions. We consider a continuous-time Markov regime-switching economic model with a bond and a stock or share index.
The evolution of the state of an economy over time is modeled by a continuous-time, finite-state, observable Markov chain α := {α(t), t ∈ [0, T ]; T < ∞} on ( , F, P), taking values in the state space S = {e 1 , e 2 , . . . , e D }, where D ≥ 2. We denote by := {λ nj : 1 ≤ n, j ≤ D} the intensity matrix of the Markov chain under P. Hence, for each 1 ≤ n, j ≤ D, λ nj is the transition intensity of the chain from state e n to state e j at time t. It is assumed that for n = j, λ nj > 0 and D j=1 λ nj = 0, hence λ nn < 0. The dynamics of (α(t)) 0≤t≤T is given in Sect. 2.
Let r = {r (t)} t∈[0,T ] be the instantaneous interest rate of the money market account B at time t. Then (4.19) where ·, · is the usual scalar product in R D and r = (r 1 , . . . , r D ) ∈ R D + . Here the value r j , the j th entry of the vector r , represents the value of the interest rate when the Markov chain is in the state e j , i.e., when α(t) = e j . The price dynamics of B can now be written as (4.20) Moreover, let μ = {μ(t)} t∈[0,T ] and σ = {σ (t)} t∈[0,T ] denote respectively the mean return and the volatility of the stock at time t. Using the same convention, we have In a similar way, μ j and σ j represent respectively the appreciation rate and volatility of the stock when the Markov chain is in state e j , i.e., when α(t) = e j . Let B = {B t } t∈[0,T ] denote the standard Brownian motion on ( , F, P) with respect to its right-continuous complete filtration F B := {F B t } 0≤t≤T . Then, the dynamic of the stock price S = {S(t)} t∈[0,T ] is given by the following Markov regime-switching geometric Brownian motion (4.21) be a real-valued Markov regime-switching pure jump process on ( , F, P). Here Z 0 (t) can be considered as the aggregate amount of claims up to and including time t. Since Z 0 is a pure jump process, one has where for each u ∈ [0, T ], Z 0 (u) = Z 0 (u) − Z 0 (u − ), represents the jump size of Z 0 at time u. Assume that the state space of claim size denoted by Z is (0, ∞). Let M be the product space [0, T ] × Z of claim arrival time and claim size. Define a random measure N 0 (·, ·) on the product space M, which selects claim arrivals and size ζ := Z 0 (u) − Z 0 (u − ) at time u, then the aggregate insurance claim process Z 0 can be written as (4.23) with λ 0 = (λ 0 1 , . . . , λ 0 D ) ∈ R D + . Here the value λ o j , the j th entry of the vector λ 0 , represents the intensity rate of N when the Markov chain is in the space state e j , i.e., when α(t − ) = e j . Denote by F j (ζ ), j = 1, . . . , D the probability distribution of the claim size ζ := Z 0 (u) − Z 0 (u − ) when α(t − ) = e j . Then the compensator of the Markov regime switching random measure N 0 (·, ·) under P is given by Hence a compensated version N 0 α (·, ·) of the Markov regime-switching random measure is defined by The premium rate P 0 (t) at time t is given by with P 0 = (P 0,1 , . . . , P 0,D ) ∈ R D + . Let R 0 := {R 0 (t)} t∈[0,T ] be the surplus process of the insurance company without investment. Then with R 0 (0) = r 0 . For each j = 1, . . . , D and each t ∈ [0, T ], J j (t) is the occupation time of the chain α in the state e j up to time t, that is (4.30) The following information structure will be important for the derivation of the dynamics of the company' surplus process. Let F Z 0 := {F Z 0 } 0≤t≤T denote the right-continuous P-completed filtration generated by Z 0 . For each t ∈ [0, T ] define F t := F Z 0 From now on, we assume that the insurance company invests the amount of π(t) in the stock at time t, for each t ∈ [0, T ]. Then π = {π(t), t ∈ [0, T ]} represents the portfolio process. Denote by X = {X π (t)} t∈[0,T ] the wealth process of the company. One can show that the dynamics of the surplus process is given by (4.31)
We denote by A the space of all admissible portfolios.
Note that although condition (4) is strong, it is intuitively natural to only consider positive wealth for the insurance company. Define G := {G t , t ∈ [0, T ]}, where G t := F B t ∨ F Z 0 t , and for n, j = 1, . . . , D, let {C nj (t), t ∈ [0, T ]} be a real-valued, G-predictable, bounded, stochastic process on ( , F, P) such that for each t ∈ [0, T ] C nj ≥ 0 for n = j and D n=1 C nj (t) = 0, i.e, C nn ≤ 0. We consider a model uncertainty setup given by a probability measure Q = Q θ,C which is equivalent to P, with Radon-Nikodym derivative on F t given by j=1,...,D is a family of rate matrices of the Markov chain α(t); see for example Dufour and Elliott (1999). For each t ∈ [0, T ], we set (4.33) We denote by C the space of all families intensity matrices C with bounded components. The Radon-Nikodym derivative or density process G θ,C is given by (4.34) where represents the transpose. Here (θ, C) may be regarded as scenario control. A control θ is admissible if θ is F-progressively measurable, with θ(t) = θ(t, ω) ≤ 1 for a.a (t, ω) ∈ [0, T ] × , and T 0 θ 2 (t)dt < ∞. We denote by the space of such admissible processes.
Next, we formulate the optimal investment problem under model uncertainty. Let U : (0, ∞) −→ R, be an utility function which is strictly increasing, strictly concave and twice continuously differentiable. The objectives of the insurance firm and the market are the following: Problem 4.5 Find a portfolio process π * ∈ A and the process (θ * , C * ) ∈ × C such that (4.35) This problem can be seen as a zero-sum stochastic differential games of an insurance firm. We have Then, it can easily be shown that Y (t) is the solution to the following linear BSDE Y (T ) = U (X π (T )). (4.38) Noting that Problem 4.5 becomes Problem 4.6 Find a portfolio process π * ∈ A and the process (θ * , C * ) ∈ × C such that where Y θ,C,π is described by the Forward-backward system (4.31) and (4.38).
Theorem 4.7 Let X π (t) be dynamics of the surplus process satisfying (4.31) with r deterministic. Consider the optimization problem to find π * ∈ A and (θ * , C * ) ∈ ×C such that (4.35) (or equivalently (4.40)) holds, with In addition, suppose U (x) = −e −βx , β ≥ 0. Then the optimal investment π * (t) and the optimal scenario measure of the market (θ * , C * ) are given respectively by  (4.43) and the optimal C * satisfies the following constraint linear optimization problem: . . . , D, (4.44) subject to the linear constraints where V j is given by (4.67).
Moreover, if we assume that the space of family matrix rates (C nj ) n, j=1,2 is bounded and write C nj (t) ∈ C l (n, j), C u (n, j) with C l (n, j) < C u (n, j), n, j = 1, 2.
Then, in this case, the optimal C * is given by: (4.48) Proof One can see that this is a particular case of a zero-sum stochastic differential games of the Forward-backward system of the form (2.5) and (2.7) with ψ = I d, The Hamiltonian in Sect. 3 is reduced to t, x, e n , y, z, k, v, π, θ, a, p, q, r The adjoint processes A(t) ,( p(t), q(t), r 0 (t, ζ ), w(t)) associated with the Hamiltonian are given by the following Forward-backward SDE (4.51) It is easy to see that the functions h and H satisfy the assumptions of Theorem 3.9. Let us now find θ * and π * . First, maximizing the Hamiltonian H with respect to π gives the first order condition for an optimal π * . (4.52) Putting A(t)e −β X (t)e T t r (s)ds = P 1 (t), then p(t) = β f (t, α(t))P 1 (t) and using once more the Itô-Lévy's formula for jump-diffusion Markov regime-switching process, we get where (D C 0,α (t)) j = (D C 0 (t)α(t)) j . Comparing (4.56) with (4.51), by equating the terms in dt, dB(t), N α (dζ, dt), and d j (t) j = 1, . . . , D, respectively, we get (4.57) Substituting this into (4.52) gives, where the last inequality follows since all coefficients are adapted to F t . Thus (4.42) in the Theorem is proved. On the other hand, we also have r 0 (t, ζ ) = p(t) (1 + θ(t))(e βζ e T t r (s)ds − 1) + θ(t) (4.59) and with the function f (·, e n ) satisfying the following backward differential equation: with the terminal condition f (T, e n ) = 1, for n = 1, . . . , D. For r = 0, the solution of such backward equation can be found in Elliott and Siu (2011). Minimizing the Hamiltonian H with respect to θ gives the first order condition for an optimal θ * .
As for the optimal (C nj ) n, j=1,...,D , the only part of the Hamiltonian that depends on C is the sum D j=1 (D C 0 (t)e n − 1) j λ nj V j (t). Hence minimizing the Hamiltonian with respect to C is equivalent to minimizing the following system of differential operator  Hence, one can obtain the solution in the two-states case (since C is bounded) with V j and f 1 given by (4.67) and (4.69) respectively. More specifically, if the Markov chain only has two states, we have to solve the following two linear programming problems: min C 11 (t),C 21 (t) (V 1 (t) − V 2 (t))C 21 (t) + λ 21 (V 2 (t) − V 1 (t)) (4.71) subject to the linear constraint C 11 + C 21 = 0 and min C 12 (t),C 22 (t) (V 2 (t) − V 1 (t))C 12 (t) + λ 12 (V 1 (t) − V 2 (t)) (4.72) subject to the linear constraint C 12 + C 22 = 0.

Conclusion
In this paper, we use a general maximum principle for Markov regime-switching Forward-backward stochastic differential equation to study optimal strategies for stochastic differential games. The proposed model covers the model uncertainty in Bordigoni et al. (2005), Elliott and Siu (2011), Faidi et al. (2011), Jeanblanc et al. (2012, Øksendal and Sulem (2012). The results obtained are applied to study two problems: first, we study robust utility maximization under relative entropy penalization. We show that the value function in this case is described by a quadratic regime-switching backward stochastic differential equation. Second, we study a problem of optimal investment of an insurance company under model uncertainty. This can be formulated as a two-player zero-sum stochastic differential games between the market and the insurance company, where the market controls the mean relative growth rate of the risky asset and the company controls the investment. We find "closed form" solutions of the optimal strategies of the insurance company and the market, when the utility is of exponential type and the Markov chain has two states. Optimal control for delayed systems has also received attention recently, due to the memory dependence of some processes. In this situation, the dynamics at the present time t does not only depend on the situation at time t but also on a finite part of their past history. Extension of the present work to the delayed case could be of interest. Such results were derived in Menoukeu-Pamen (2015) in the case of no regime-switching.
It would also be interesting to study the sensitivity of the optimal controls with respect to the given parameters. However this is not straightforward since the parameters (coefficients) in this case depend on the regime and thus stochastic. This is the object of future works. This shows that (1) ⇒ (2). Conversely, using the fact that every bounded β i ∈ A i can be approximated by a linear combinations of controls β i (t) of the form (3.11), the above argument can be reversed to show that (2) ⇒ (1).