First-order sensitivity of the optimal value in a Markov decision model with respect to deviations in the transition probability function

Markov decision models (MDM) used in practical applications are most often less complex than the underlying ‘true’ MDM. The reduction of model complexity is performed for several reasons. However, it is obviously of interest to know what kind of model reduction is reasonable (in regard to the optimal value) and what kind is not. In this article we propose a way how to address this question. We introduce a sort of derivative of the optimal value as a function of the transition probabilities, which can be used to measure the (first-order) sensitivity of the optimal value w.r.t. changes in the transition probabilities. ‘Differentiability’ is obtained for a fairly broad class of MDMs, and the ‘derivative’ is specified explicitly. Our theoretical findings are illustrated by means of optimization problems in inventory control and mathematical finance.


Introduction
Already in the 1990th, Müller (1997a) pointed out that the impact of the transition probabilities of a Markov decision process (MDP) on the optimal value of a corresponding Markov decision model (MDM) can not be ignored for practical issues. For instance, in most cases the transition probabilities are unknown and have to be estimated by statistical methods. Moreover in many applications the 'true' model is replaced by an approximate version of the 'true' model or by a variant which is simplified and thus less complex. The result is that in practical applications the optimal (strategy and thus the optimal) value is most often computed on the basis of transition probabilities that differ from the underlying true transition probabilities. Therefore the sensitivity of the optimal value w.r.t. deviations in the transition probabilities is obviously of interest. Müller (1997a) showed that under some structural assumptions the optimal value in a discrete-time MDM depends continuously on the transition probabilities, and he established bounds for the approximation error. In the course of this the distance between transition probabilities was measured by means of some suitable probability metrics. Even earlier, Kolonko (1983) obtained analogous bounds in a MDM in which the transition probabilities depend on a parameter. Here the distance between transition probabilities was measured by means of the distance between the respective parameters. Error bounds for the expected total reward of discrete-time Markov reward processes were also specified by Van Dijk (1988) and Van Dijk and Puterman (1988). In the latter reference the authors also discussed the case of discrete-time Markov decision processes with countable state and action spaces.
In this article, we focus on the situation where the 'true' model is replaced by a less complex version (for a simple example, see Subsection 1.4.3 in the supplemental article Kern et al. (2020)). The reduction of model complexity in practical applications is common and performed for several reasons. Apart from computational aspects and the difficulty of considering all relevant factors, one major point is that statistical inference for certain transition probabilities can be costly in terms of both time and money. However, it is obviously of interest to know what kind of model reduction is reasonable and what kind is not. In the following we want to propose a way how to address the latter question.
Our original motivation comes from the field of optimal logistics transportation planning, where ongoing projects like SYNCHRO-NET (https://www.synchronet.eu/) aim at stochastic decision models based on transition probabilities estimated from historical route information. Due to the lack of historical data for unlikely events, transition probabilities are often modeled in a simplified way. In fact, events with small probabilities are often ignored in the model. However, the impact of these events on the optimal value (here the minimal expected transportation costs) of the corresponding MDM may nevertheless be significant. The identification of unlikely but potentially cost sensitive events is therefore a major challenge. In logistics planning operations engineers have indeed become increasingly interested in comprehensibly quantifying the sensitivity of the optimal value w.r.t. the incorporation of unlikely events into the model. For background see, for instance, Holfeld and Simroth (2017) and Holfeld et al. (2018). The assessment of rare but risky events takes on greater importance also in other areas of applications; see, for instance, Komljenovic et al. (2016), Yang et al. (2015) and references cited therein.
By an incorporation of an unlikely event into the model we mean, for instance, that under performance of an action a at some time n a previously impossible transition from one state x to another state y gets now assigned small but strictly positive probability ε. Mathematically this means that the transition probability P n ((x, a), · ) is replaced by (1 − ε)P n ((x, a), • ) + ε Q n ((x, a), • ) with Q n ((x, a), • ) := δ y [ • ], where δ y is the Dirac measure at y. More generally one could consider a change of the whole transition function (the family of all transition probabilities) P to (1−ε) P +ε Q with ε > 0 small. For operations engineers it is here interesting to know how this change affects the optimal value, V 0 ( P). If the effect is minor, then an incorporation can be seen as superfluous, at least from a pragmatic point of view. If on the other hand the effect is significant, then the engineer should consider the option to extend the model and to make an effort to get access to statistical data for the extended model.
At this point it is worth mentioning that a change of the transition function from P to (1 − ε) P + ε Q with ε > 0 small can also have a different interpretation than an incorporation of an (unlikely) new event. It could also be associated with an incorporation of an (unlikely) divergence from the normal transition rules. See Sect. 4.5 for an example.
In this article, we will introduce an approach for quantifying the effect of changing the transition function from P to (1 − ε) P + ε Q, with ε > 0 small, on the optimal value V 0 ( P) of the MDM. In view of (1 − ε) P + ε Q = P + ε( Q − P), we feel that it is reasonable to quantify the effect by a sort of derivative of the value functional V 0 at P evaluated at direction Q − P. To some extent the 'derivative'V 0; P ( Q − P) specifies the first-order sensitivity of V 0 ( P) w.r.t. a change of P as above. Take into account that V 0 ( P + ε( Q − P)) − V 0 ( P) ≈ ε ·V 0; P ( Q − P) for ε > 0 small.
To be able to compare the first-order sensitivity for (infinitely) many different Q, it is favourable to know that the approximation in (1) is uniform in Q ∈ K for preferably large sets K of transition functions. Moreover, it is not always possible to specify the relevant Q exactly. For that reason it would be also good to have robustness (i.e. some sort of continuity) ofV 0; P ( Q − P) in Q. These two things induced us to focus on a variant of tangential S-differentiability as introduced by Sebastião e Silva (1956) and Averbukh and Smolyanov (1967) (here S is a family of sets K of transition functions). In Section 3 we present a result on 'S-differentiability' of V 0 for the family S of all relatively compact sets of admissible transition functions and a reasonably broad class of MDMs, where we measure the distance between transition functions by means of metrics based on probability metrics as in Müller (1997a).
The 'derivative'V 0; P ( Q − P) of the optimal value functional V 0 at P quantifies the effect of a change from P to (1 − ε) P + ε Q, with ε > 0 small, assuming that after the change the strategy π (tuple of the underlying decision rules) is chosen such that it optimizes the target value V π 0 ( P ) (e.g. expected total costs or rewards) in π under the new transition function P := (1−ε) P +ε Q. On the other hand, practitioners are also interested in quantifying the impact of a change of P when the optimal strategy (under P) is kept after the change. Such a quantification would somehow answers the question: How much different does a strategy derived in a simplified MDM perform in a more complex (more realistic) variant of the MDM? Since the 'derivative'V π 0; P ( Q − P) of the functional V π 0 under a fixed strategy π turns out to be a building stone for the derivativeV 0; P ( Q − P) of the optimal value functional V 0 at P, our elaborations cover both situations anyway. For fixed strategy π we obtain 'S-differentiability' of V π 0 even for the broader family S of all bounded sets of admissible transition functions.
The 'derivative' which we propose to regard as a measure for the first-order sensitivity will formally be introduced in Definition 7. This definition is applicable to quite general finite time horizon MDMs and might look somewhat cumbersome at first glance. However, in the special case of a finite state space and finite action spaces, a situation one faces in many practical applications, the proposed 'differentiability' boils down to a rather intuitive concept. This will be explained in Section 1 of the supplemental article Kern et al. (2020) with a minimum of notation and terminology. In Section 1 of the supplemental article Kern et al. (2020) we will also reformulate a backward iteration scheme for the computation of the 'derivative' (which can be deduced from our main result, Theorem 1) in the discrete case, and we will discuss an example.
In Section 2 we formally introduce quite general MDMs in the fashion of the standard monographs Bäuerle and Rieder (2011), Hernández-Lerma and Lasserre (1996), Hinderer (1970), Puterman (1994. Since it is important to have an elaborate notation in order to formulate our main result, we are very precise in Section 2. As a result, this section is a little longer compared to the respective sections in other articles on MDMs. In Section 3 we carefully introduce our notion of 'differentiability' and state our main result concerning the computation of the 'derivative' of the value functional. In Section 4 we will apply the results of Section 3 to assess the impact of one or more than one unlikely but substantial shock in the dynamics of an asset on the solution of a terminal wealth problem in a (simple) financial market model free of shocks. This example somehow motivates the general set-up chosen in Sections 2-3. All results of this article are proven in Sections 3-5 of the supplemental article Kern et al. (2020). For the convenience of the reader we recall in Section 6 of the supplemental article Kern et al. (2020) a result on the existence of optimal strategies in general MDMs. Section 7 of the supplemental article Kern et al. (2020) contains an auxiliary topological result.

Formal definition of Markov decision model
Let E be a non-empty set equipped with a σ -algebra E, referred to as state space. Let N ∈ N be a fixed finite time horizon (or planning horizon) in discrete time. For each point of time n = 0, . . . , N − 1 and each state x ∈ E, let A n (x) be a non-empty set. The elements of A n (x) will be seen as the admissible actions (or controls) at time n in state x. For each n = 0, . . . , N − 1, let A n := x∈E A n (x) and D n := (x, a) ∈ E × A n : a ∈ A n (x) .
The elements of A n can be seen as the actions that may basically be selected at time n whereas the elements of D n are the possible state-action combinations at time n. For our subsequent analysis, we equip A n with a σ -algebra A n , and let D n := (E ⊗ A n ) ∩ D n be the trace of the product σ -algebra E ⊗ A n in D n . Recall that a map P n : D n × E → [0, 1] is said to be a probability kernel (or Markov kernel) from is the set of all probability measures on (E, E).

Markov decision process
In this subsection, we will give a formal definition of an E-valued (discrete-time) Markov decision process (MDP) associated with a given initial state, a given transition function and a given strategy. By definition a (Markov decision) transition (probability) function is an N -tuple P = (P 0 , . . . , P N −1 ) whose n-th entry P n is a probability kernel from (D n , D n ) to (E, E). In this context P n will be referred to as one-step transition (probability) kernel at time n (or from time n to n + 1) and the probability measure P n ((x, a), • ) is referred to as one-step transition probability at time n (or from time n to n + 1) given state x and action a. We denote by P the set of all transition functions.
We will assume that the actions are performed by a so-called N -stage strategy (or N -stage policy). An (N -stage) strategy is an N -tuple of decision rules at times n = 0, . . . , N − 1, where a decision rule at time n is an (E, A n )-measurable map f n : E → A n satisfying f n (x) ∈ A n (x) for all x ∈ E. Note that a decision rule at time n is (deterministic and) 'Markovian' since it only depends on the current state and is independent of previous states and actions. We denote by F n the set of all decision rules at time n, and assume that F n is non-empty. Hence a strategy is an element of the set F 0 × · · · × F N −1 , and this set can be seen as the set of all strategies. Moreover, we fix for any n = 0, . . . , N − 1 some F n ⊆ F n which can be seen as the set of all admissible decision rules at time n. In particular, the set Π := F 0 × · · · × F N −1 can be seen as the set of all admissible strategies.
For any transition function P = (P n ) N −1 n=0 ∈ P, strategy π = ( f n ) N −1 n=0 ∈ Π , and time point n ∈ {0, . . . , N − 1}, we can derive from P n a probability kernel P π n from The probability measure P π n (x, • ) can be seen as the one-step transition probability at time n given state x when the transitions and actions are governed by P and π , respectively. Now, consider the measurable space For any x 0 ∈ E, P = (P n ) N −1 n=0 ∈ P, and π ∈ Π define the probability measure on (Ω, F), where x 0 should be seen as the initial state of the MDP to be constructed. The right-hand side of (3) is the usual product of the probability measure δ x 0 and the kernels P π 0 , . . . , P π N −1 ; for details see display (16) in Section 2 of the supplemental article Kern et al. (2020). Moreover let X = (X 0 , . . . , X N ) be the identity on Ω, i.e.
Note that, for any x 0 ∈ E, P = (P n ) N −1 n=0 ∈ P, and π ∈ Π , the map X can be regarded as an (E N +1 , E ⊗(N +1) )-valued random variable on the probability space (Ω, F, P x 0 , P;π ) with distribution δ x 0 ⊗ P π 0 ⊗ · · · ⊗ P π N −1 . It follows from Lemma 1 in the supplemental article Kern et al. (2020) that for any The formulation of (ii)-(iv) is somewhat sloppy, because in general a (regular version of the) factorized conditional distribution of X given Y under P x 0 , P;π (evaluated at a fixed set B ∈ E) is only P x 0 , P;π Y -a.s. unique. So assertion (iv) in fact means that the probability kernel P n (( · , f n ( · )), • ) provides a (regular version of the) factorized conditional distribution of X n+1 given X n under P x 0 , P;π , and analogously for (ii) and (iii). Note that the factorized conditional distribution in part (ii) is constant w.r.t.
x 0 ∈ E. Assertions (iii) and (iv) together imply that the temporal evolution of X n is Markovian. This justifies the following terminology.

Markov decision model and value function
Maintain the notation and terminology introduced in Sect. 2.1. In this subsection, we will first define a (discrete-time) Markov decision model (MDM) and introduce subsequently the corresponding value function. The latter will be derived from a reward maximization problem. Fix P ∈ P, and let for each point of time n = 0, . . . , N − 1 r n : D n −→ R be a (D n , B(R))-measurable map, referred to as one-stage reward function. Here r n (x, a) specifies the one-stage reward when action a is taken at time n in state x. Let be an (E, B(R))-measurable map, referred to as terminal reward function. The value r N (x) specifies the reward of being in state x at terminal time N .
Denote by A the family of all sets A n (x), n = 0, . . . , N − 1, x ∈ E, and set r := (r n ) N n=0 . Moreover let X be defined as in (4) and recall Definition 1. Then we define our MDM as follows.
Definition 2 (MDM) The quintuple (X, A, P, Π, r) is called (discrete-time) Markov decision model (MDM) associated with the family of action spaces A, transition function P ∈ P, set of admissible strategies Π , and reward functions r.
In the sequel we will always assume that a MDM (X, A, P, Π, r) satisfies the following Assumption (A). In Sect. 3.1 we will discuss some conditions on the MDM under which Assumption (A) holds. We will use E x 0 , P;π n,x n to denote the expectation w.r.t. the factorized conditional distribution P x 0 , P;π [ • X n = x n ]. For n = 0, we clearly have P x 0 , P;π [ • X 0 = x 0 ] = P x 0 , P;π [ • ] for every x 0 ∈ E; see Lemma 1 in the supplemental article Kern et al. (2020). In what follows we use the convention that the sum over the empty set is zero.
Under Assumption (A) we may define in a MDM (X, A, P, Π, r) for any π = ( f n ) N −1 n=0 ∈ Π and n = 0, . . . , N a map V P;π n : E → R through As a factorized conditional expectation this map is (E, B(R))-measurable (for any π ∈ Π and n = 0, . . . , N ). Note that for n = 1, . . . , N the right-hand side of (5) does not depend on x 0 ; see Lemma 2 in the supplemental article Kern et al. (2020). Therefore the map V P;π n (·) need not be equipped with an index x 0 . The value V P;π n (x n ) specifies the expected total reward from time n to N of X under P x 0 , P;π when strategy π is used and X is in state x n at time n. It is natural to ask for those strategies π ∈ Π for which the expected total reward from time 0 to N is maximal for all initial states x 0 ∈ E. This results in the following optimization problem: If a solution π P to the optimization problem (6) (in the sense of Definition 4 ahead) exists, then the corresponding maximal expected total reward is given by the so-called value function (at time 0 ).
Definition 3 (Value function) For a MDM (X, A, P, Π, r) the value function at time n ∈ {0, . . . , N } is the map V P n : E → R defined by Note that the value function V P n is well defined due to Assumption (A) but not necessarily (E, B(R))-measurable. The measurability holds true, for example, if the sets F n , . . . , F N −1 are at most countable or if conditions (a)-(c) of Theorem 2 in the supplemental article Kern et al. 2020) are satisfied; see also Remark 1(i) in the supplemental article Kern et al. (2020).
Definition 4 (Optimal strategy) In a MDM (X, A, P, Π, r) a strategy π P ∈ Π is called optimal w.r.t. P if In this case V P;π P 0 (x 0 ) is called optimal value (function), and we denote by Π( P) the set of all optimal strategies w.r.t. P. Further, for any given δ > 0, a strategy π P;δ ∈ Π is called δ-optimal w.r.t. P in a MDM (X, A, P, Π, r) if and we denote by Π( P; δ) the set of all δ-optimal strategies w.r.t. P.
Note that condition (8) requires that π P ∈ Π is an optimal strategy for all possible initial states x 0 ∈ E. Though, in some situations it might be sufficient to ensure that π P ∈ Π is an optimal strategy only for some fixed initial state x 0 . For a brief discussion of the existence and computation of optimal strategies, see Section 6 of the supplemental article Kern et al. (2020).
Remark 1 (i) In practice, the choice of an action can possibly be based on historical observations of states and actions. In particular one could relinquish the Markov property of the decision rules and allow them to depend also on previous states and actions. Then one might hope that the corresponding (deterministic) history-dependent strategies improve the optimal value of a MDM (X, A, P, Π, r). However, it is known that the optimal value of a MDM (X, A, P, Π, r) can not be enhanced by considering history-dependent strategies; see, e.g., Theorem 18.4 in Hinderer (1970) or Theorem 4.5.1 in Puterman (1994. (ii) Instead of considering the reward maximization problem (6) one could as well be interested in minimizing expected total costs over the time horizon N . In this case, one can maintain the previous notation and terminology when regarding the functions r n and r N as the one-stage costs and the terminal costs, respectively. The only thing one has to do is to replace "sup" by "inf" in the representation (7) of the value function. Accordingly, a strategy π P;δ ∈ Π will be δ-optimal for a given δ > 0 if in condition (9) "−δ" and "≤" are replaced by "+δ" and "≥".

'Differentiability' in P of the optimal value
In this section, we show that the value function of a MDM, regarded as a real-valued functional on a set of transition functions, is 'differentiable' in a certain sense. The notion of 'differentiability' we use for functionals that are defined on a set of admissible transition functions will be introduced in Sect. 3.4. The motivation of our notion of 'differentiability' was discussed subsequent to (1). Before defining 'differentiability' in a precise way, we will explain in Sect. 3.2-3.3 how we measure the distance between transition functions. In Sect. 3.5-3.6 we will specify the 'Hadamard derivative' of the value function. At first, however, we will discuss in Sect. 3.1 some conditions under which Assumption (A) holds true. Throughout this section, A, Π , and r are fixed.

Bounding functions
Recall from Section 2 that P stands for the set of all transition functions, i.e. of all N -tuples P The following definition is adapted from Bäuerle and Rieder (2011), Müller (1997a), Wessels (1977. Conditions (a)-(c) of this definition are sufficient for the well-definiteness of V P;π n (and V P n ); see Lemma 1 ahead. Definition 5 (Bounding function) Let P ⊆ P. A gauge function ψ : E → R ≥1 is called a bounding function for the family of MDMs {(X, A, P, Π, r) : P ∈ P } if there exist finite constants K 1 , K 2 , K 3 > 0 such that the following conditions hold for any n = 0, . . . , N − 1 and P = ( If P = {P} for some P ∈ P, then ψ is called a bounding function for the MDM (X, A, P, Π, r).
Note that the conditions in Definition 5 do not depend on the set Π . That is, the terminology bounding function is independent of the set of all (admissible) strategies. Also note that conditions (a) and (b) can be satisfied by unbounded reward functions.
The following lemma, whose proof can be found in Subsection 3.1 of the supplemental article Kern et al. (2020), ensures that Assumption (A) is satisfied when the underlying MDM possesses a bounding function.
Lemma 1 Let P ⊆ P. If the family of MDMs {(X, A, P, Π, r) : P ∈ P } possesses a bounding function ψ, then Assumption (A) is satisfied for any P ∈ P . Moreover, the expectation in Assumption (A) is even uniformly bounded w.r.t. P ∈ P , and V P;π n (·) is contained in M ψ (E) for any P ∈ P , π ∈ Π , and n = 0, . . . , N .

Metric on set of probability measures
In Sect. 3.4 we will work with a (semi-) metric (on a set of transition functions) to be defined in (11) below. As it is common in the theory of probability metrics (see, e.g., p. 10 ff in Rachev 1991), we allow the distance between two probability measures and the distance between two transition functions to be infinite. That is, we adapt the axioms of a (semi-) metric but we allow a (semi-) metric to take values in R ≥0 := R ≥0 ∪ {∞} rather than only in R ≥0 := [0, ∞).
Let ψ be any gauge function, and denote by M Note that (10) which is symmetric and fulfills the triangle inequality, i.e. d M provides a semi-metric. If M separates , then d M is even a metric. It is sometimes called integral probability metric or probability metric with a ζ -structure; see Müller (1997b), Zolotarev (1983. In some situations the (semi-) metric d M (with M fixed) can be represented by the right-hand side of (10) with M replaced by a different subset M of M ψ (E). Each such set M is said to be a generator of d M . The largest generator of d M is called the maximal generator of d M and denoted by M. That is, M is defined to be the set of all We now give some examples for the distance d M . The metrics in the first four examples were already mentioned in Müller (1997a, b). In the last three examples d M metricizes the ψ-weak topology. The latter is defined to be the coarsest topology and the ψ-weak topology is nothing but the classical weak topology. In Section 2 in  one can find characterizations of those subsets of M ψ 1 (E) on which the relative ψ-weak topology coincides with the relative weak topology.  Müller (1997b).
where F μ and F ν refer to the distribution functions of μ and ν, respectively. The set M Kolm clearly separates points in M Müller (1997b). Dudley (2002). Moreover it is known (see, e.g., Theorem 11.3.3 in Dudley 2002 Recall from Vallender (1974) In this case the ψ-weak topology is also referred to as L 1 -weak topology. Note that the L 1 -Wasserstein metric is a conventional metric for measuring the distance between probability distributions; see, for instance, Dall'Aglio (1956), Kantorovich andRubinstein (1958), Vallender (1974)  Although the Kantorovich metric is a popular and well established metric, for the application in Section 4 we will need the following generalization from α = 1 to α ∈ (0, 1]. Especially when dealing with risk averse utility functions (as, e.g., in Section 4) this metric can be beneficial. Lemma 9 in Section 7 of the supplemental article Kern et al. (2020) shows that if E is complete and separable then d Höl,α metricizes the ψ-weak topology on M ψ 1 (E).

Metric on set of transition functions
Maintain the notation from Sect. 3.2. Let us denote by P ψ the set of all transition functions P = (P n ) N −1 n=0 ∈ P satisfying E ψ(y) P n ((x, a), dy) < ∞ for all (x, a) ∈ D n and n = 0, . . . , N − 1. That is, P ψ consists of those transition func- exist and are finite. In particular, for two transition functions P = (P n ) N −1 n=0 and for another gauge function φ : Maybe apart from the factor 1/φ(x), the definition of d φ ∞,M ( P, Q) in (11) is quite natural and in line with the definition of a distance introduced by Müller (1997a, p. 880). In Müller (1997a), Müller considers time-homogeneous MDMs, so that the transition kernels do not depend on n. He fixed a state x and took the supremum only over all admissible actions a in state x. That is, for any x ∈ E he defined the distance between To obtain a reasonable distance between P n and Q n it is however natural to take the supremum of the distance between P n ((x, · ), • ) and Q n ((x, · ), • ) w.r.t. d M uniformly over a and over x.
The factor 1/φ(x) in (11) causes that the (semi-) metric d φ ∞,M is less strict compared to the (semi-) metric d 1 ∞,M which is defined as in (11) with φ :≡ 1. For a motivation of considering the factor 1/φ(x), see part (iii) of Remark 2 and the discussion afterwards.

Definition of 'differentiability'
Let ψ be any gauge function, and fix some P ψ ⊆ P ψ being closed under mixtures (i.e. (1 − ε) P + ε Q ∈ P ψ for any P, Q ∈ P ψ , ε ∈ (0, 1)). The set P ψ will be equipped with the distance d φ ∞,M introduced in (11). In Definition 7 below we will introduce a reasonable notion of 'differentiability' for an arbitrary functional V : P ψ → L taking values in a normed vector space (L, · L ). It is related to the general functional analytic concept of (tangential) S-differentiability introduced by Sebastião e Silva (1956) and Averbukh and Smolyanov (1967); see also Fernholz (1983), Gill (1989), Shapiro (1990) for applications. However, P ψ is not a vector space. This implies that Definition 7 differs from the classical notion of (tangential) S-differentiability. For that reason we will use inverted commas and write 'S-differentiability' instead of S-differentiability. Due to the missing vector space structure, we in particular need to allow the tangent space to depend on the point P ∈ P ψ at which V is differentiated. The role of the 'tangent space' will be played by the set can be seen as signed transition functions. In Definition 7 we will employ the following terminology.
For the following definition it is important to note that P + ε( Q − P) lies in P ψ for any P, Q ∈ P ψ and ε ∈ (0, 1]. for every K ∈ S and every sequence (ε m ) Note that in Definition 7 the derivative is not required to be linear (in fact the derivative is not even defined on a vector space). This is another point where Definition 7 differs from the functional analytic definition of (tangential) S-differentiability. However, non-linear derivatives are common in the field of mathematical optimization; see, for instance, Römisch (2004), Shapiro (1990).
Remark 2 (i) At least in the case L = R, the 'S-derivative'V P evaluated at Q − P, i.e. V P ( Q − P), can be seen as a measure for the first-order sensitivity of the functional V : P ψ → R w.r.t. a change of the argument from P to (1 − ε) P + ε Q, with ε > 0 small, for some given transition function Q.
(ii) The prefix 'S-' in Definition 7 provides the following information. Since the convergence in (12) is required to be uniform in Q ∈ K, the values of the firstorder sensitivitiesV P ( Q − P), Q ∈ K, can be compared with each other with clear conscience for any fixed K ∈ S. It is therefore favorable if the sets in S are large. However, the larger the sets in S, the stricter the condition of 'S-differentiability'.
(iii) The subset M (⊆ M ψ (E)) and the gauge function φ tell us in a way how 'robust' the 'S-derivative'V P is w.r.t. changes in Q: The smaller the set M and the 'steeper' the gauge function φ, the less strict the metric d φ ∞,M ( P, Q) (given by (11)) and the more robustV P ( Q − P) in Q. It is thus favorable if the set M is small and the gauge function φ is 'steep'. However, the smaller M and the 'steeper' φ, the stricter the condition of (M, φ)-continuity (and thus of 'S-differentiability' w.r. t. (M, φ)). More precisely, if M 1 ⊆ M 2 and φ 1 ≥ φ 2 then (M 1 , φ 1 )-continuity implies (M 2 , φ 2 )-continuity.
(iv) In general the choice of S and the choice of the pair (M, φ) in Definition 7 do not necessarily depend on each other. However in the specific settings (b) and (c) in Definition 8, and in particular in the application in Section 4, they do.
In the general framework of our main result (Theorem 1) we can not choose φ 'steeper' than the gauge function ψ which plays the role of a bounding function there. Indeed, the proof of (M, ψ)-continuity of the mapV P : P P;± The last sentence before Definition 8 and the last sentence in part (iii) of Remark 2 together imply that 'Hadamard (resp. Fréchet) differentiability' w.r.t. (M, φ 1 ) implies 'Hadamard (resp. Fréchet) differentiability' w.r.t. (M, φ 2 ) when φ 1 ≥ φ 2 .
The following lemma, whose proof can be found in Subsection 3.2 of the supplemental article Kern et al. (2020), provides an equivalent characterization of 'Hadamard differentiability'.
Lemma 2 Let M ⊆ M ψ (E), φ be another gauge function, and V : P ψ → L be any map. Fix P ∈ P ψ . Then the following two assertions hold. ( (ii) If there exists an (M, φ)-continuous mapV P : P P;± ψ → L such that (13) holds

'Differentiability' of the value functional
Recall that A, Π , and r are fixed, and let V P;π n and V P n be defined as in (5) and (7), respectively. Moreover let ψ be any gauge function and fix some P ψ ⊆ P ψ being closed under mixtures.
In view of Lemma 1 (with P := { P}), condition (a) of Theorem 1 below ensures that Assumption (A) is satisfied for any P ∈ P ψ . Then for any x n ∈ E, π ∈ Π , and n = 0, . . . , N we may define under condition (a) of Theorem 1 functionals V x n ;π n : P ψ → R and V x n n : P ψ → R by V x n ;π n ( P) := V P;π n (x n ) and V x n n ( P) respectively. Note that V x n n ( P) specifies the maximal value for the expected total reward in the MDM (given state x n at time n) when the underlying transition function is P. By analogy with the name 'value function' we refer to V x n n as value functional given state x n at time n. Part (ii) of Theorem 1 provides (under some assumptions) the 'Hadamard derivative' of the value functional V x n n in the sense of Definition 8. Conditions (b) and (c) of Theorem 1 involve the so-called Minkowski (or gauge) functional ρ M : M ψ (E) → R ≥0 (see, e.g., Rudin (1991, p. 25)) defined by where we use the convention inf ∅ := ∞, M is any subset of M ψ (E), and we set R >0 := (0, ∞). We note that Müller (1997a) also used the Minkowski functional to formulate his assumptions.

Example 6
For the sets M (and the corresponding gauge functions ψ) from Examples h Lip , and ρ M Höl ,α (h) = h Höl,α , where as before M TV and M Kolm are used to denote the maximal generator of d TV and d Kolm , respectively. The latter three equations are trivial, for the former two equations see Müller (1997a, p. 880).
Recall from Definition 4 that for given P ∈ P ψ and δ > 0 the sets Π( P; δ) and Π( P) consist of all δ-optimal strategies w.r.t. P and of all optimal strategies w.r.t. P, respectively. Generators M of d M were introduced subsequent to (10).
Theorem 1 ('Differentiability' of V x n ;π n and V x n n ) Let M ⊆ M ψ (E) and M be any generator of d M . Fix P = (P n ) N −1 n=0 ∈ P ψ , and assume that the following three conditions hold.
(a) ψ is a bounding function for the MDM (X, A, Q, Π, r) for any Q ∈ P ψ .
Then the following two assertions hold.
The proof of Theorem 1 can be found in Section 4 of the supplemental article Kern et al. (2020). Note that the set Π( P; δ) shrinks as δ decreases. Therefore the right-hand side of (17) is well defined. The supremum in (18) ranges over all optimal strategies w.r.t. P. If, for example, the MDM (X, A, P, Π, r) satisfies conditions (a)-(c) of Theorem 2 in the supplemental article Kern et al. (2020), then by part (iii) of this theorem an optimal strategy can be found, i.e. Π( P) is non-empty. The existence of an optimal strategy is also ensured if the sets F 0 , . . . , F N −1 are finite (a situation one often faces in applications). In the latter case the 'Hadamard derivative'V x n n; P ( Q − P) can easily be determined by computing the finitely many valuesV x n ;π n; P ( Q− P), π ∈ Π( P), and taking their maximum. The discrete case will be discussed in more detail in Subsection 1.5 of the supplemental article Kern et al. (2020).
If there exists a unique optimal strategy π P ∈ Π w.r.t. P, then Π( P) is nothing but the singleton {π P }, and in this case the 'Hadamard derivative'V x 0 0; P of the optimal value (functional) V x 0 0 at P coincides withV x 0 ;π P 0; P .

Remark 3 (i)
The 'Fréchet differentiability' in part (i) of Theorem 1 holds even uniformly in π ∈ Π ; see Theorem 1 in the supplemental article Kern et al. (2020) for the precise meaning.
(ii) We do not know if it is possible to replace 'Hadamard differentiability' by 'Fréchet differentiability' in part (ii) of Theorem 1. The following arguments rather cast doubt on this possibility. The proof of part (ii) is based on the decomposition of the value functional V x n n in display (26) of the supplemental article Kern et al. (2020) and a suitable chain rule, where this decomposition involves the sup-functional Ψ introduced in display (27) of the supplemental article Kern et al. (2020). However, Corollary 1 in Cox and Nadler (1971) (see also Proposition 4.6.5 in Schirotzek 2007) shows that in normed vector spaces sup-functionals are in general not Fréchet differentiable. This could be an indication that 'Fréchet differentiable' of the value functional indeed fails. We can not make a reliable statement in this regard.
(iv) In the case where we are interested in minimizing expected total costs in the MDM (X, A, P, Π, r) (see Remark 1(ii)), we obtain under the assumptions (and with the same arguments as in the proof of part (ii)) of Theorem 1 that the 'Hadamard derivative' of the corresponding value functional is given by (17) (resp. (18)) with "sup" replaced by "inf". (ii) In some situations, condition (a) implies condition (b) in Theorem 1. This is the case, for instance, in the following four settings (the involved sets M and metrics were introduced in Examples 1-5).
In applications it is not necessarily easy to specify the set Π( P) of all optimal strategies w.r.t. P. While in most cases an optimal strategy can be found with little effort (one can use the Bellman equation; see part (i) of Theorem 2 in Section 6 of the supplemental article Kern et al. 2020), it is typically more involved to specify all optimal strategies or to show that the optimal strategy is unique. The following remark may help in some situations; for an application see Sect. 4.4.

Remark 5
In some situations it turns out that for every P ∈ P ψ the solution of the optimization problem (6) does not change if Π is replaced by a subset Π ⊆ Π (being independent of P). Then in the definition (7) of the value function (at time 0) the set Π can be replaced by the subset Π , and it follows (under the assumptions of Theorem 1) that in the representation (18) of the 'Hadamard derivative'V x 0 0; P of V x 0 0 at P the set Π( P) can be replaced by the set Π ( P) of all optimal strategies w.r.t. P from the subset Π . Of course, in this case it suffices to ensure that conditions (a)-(b) of Theorem 1 are satisfied for the subset Π instead of Π .

Two alternative representations ofV x n ; n; P
In this subsection we present two alternative representations (see (19) and (20)) of the 'Fréchet derivative'V x n ;π n; P in (16). The representation (19) will be beneficial for the proof of Theorem 1 (see Lemma 3 in Subsection 4.1 of the supplemental article Kern et al. 2020) and the representation (20) will be used to derive the 'Hadamard derivative' of the optimal value of the terminal wealth problem in (28) below (see the proof of Theorem 3 in Subsection 5.3 of the supplemental article Kern et al. 2020).
Remark 6 (Representation I) By rearranging the sums in (16), we obtain under the assumptions of Theorem 1 that for every fixed P = (P n ) N −1 n=0 ∈ P ψ the 'Fréchet derivative'V x n ;π n; P of V x n ;π n at P can be represented aṡ Remark 7 (Representation II) For every fixed P = (P n ) N −1 n=0 ∈ P ψ , and under the assumptions of Theorem 1, the 'Fréchet derivative'V x n ;π n; P of V x n ;π n at P admits the representationV x n ;π n; P ( Q − P) =V for every (y) P k ( · , f k (·)), dy Indeed, it is easily seen thatV P, Q;π n (x n ) coincides with the right-hand side of (19). Note that it can be verified iteratively by means of condition (a) of Theorem 1 and Lemma 1 (with P := { Q}) thatV P, Q;π n (·) ∈ M ψ (E) for every Q ∈ P ψ , π ∈ Π , and n = 0, . . . , N . In particular, this implies that the integrals on the right-hand side of (21) exist and are finite. Also note that the iteration scheme (21) involves the family (V P;π k ) N k=1 which itself can be seen as the solution of a backward iteration scheme: see Proposition 1 of the supplemental article Kern et al. (2020).

Application to a terminal wealth optimization problem in mathematical finance
In this section we will apply the theory of Sections 2-3 to a particular optimization problem in mathematical finance. At first, we introduce in Sect. 4.1 the basic financial market model and formulate subsequently the terminal wealth problem as a classical optimization problem in mathematical finance. The market model is in line with standard literature as Bäuerle andRieder (2011, Chapter 4) or (Föllmer andSchied 2011, Chapter 5). To keep the presentation as clear as possible we restrict ourselves to a simple variant of the market model (only one risky asset). In Sect. 4.2 we will see that the market model can be embedded into the MDM of Sect. 2. It turns out that the existence (and computation) of an optimal (trading) strategy can be obtained by solving iteratively N one-stage investment problems; see Sect. 4.3. In Sect. 4.4 we will specify the 'Hadamard derivative' of the optimal value functional of the terminal wealth problem, and Sect. 4.5 provides some numerical examples for the 'Hadamard derivative'.

Basic financial market model, and the target
Consider an N -period financial market consisting of one riskless bond B = (B 0 , . . . , B N ) and one risky asset S = (S 0 , . . . , S N ). Further assume that the value of the bond evolves deterministically according to for some fixed constants r 1 , . . . , r N ∈ R ≥1 , and that the value of the asset evolves stochastically according to for some independent R ≥0 -valued random variables R 1 , . . . , R N on some probability space (Ω, F, P) with (known) distributions m 1 , . . . , m N , respectively. Throughout Section 4 we will assume that the financial market satisfies the following Assumption (FM), where α ∈ (0, 1) is fixed and chosen as in (24) below.
In Examples 7 and 8 we will discuss specific financial market models which satisfy Assumption (FM).
Note that for any n = 0, . . . , N − 1 the value r n+1 (resp. R n+1 ) corresponds to the relative price change B n+1 /B n (resp. S n+1 /S n ) of the bond (resp. asset) between time n and n + 1. Let F 0 be the trivial σ -algebra, and set F n := σ (S 0 , . . . , S n ) = σ (R 1 , . . . , R n ) for any n = 1, . . . , N . Now, an agent invests a given amount of capital x 0 ∈ R ≥0 in the bond and the asset according to some self-financing trading strategy. By trading strategy we mean an (F n )-adapted R 2 ≥0 -valued stochastic process ϕ = (ϕ 0 n , ϕ n ) N −1 n=0 , where ϕ 0 n (resp. ϕ n ) specifies the amount of capital that is invested in the bond (resp. asset) during the time interval [n, n+1). Here we require that both ϕ 0 n and ϕ n are nonnegative for any n, which means that taking loans and short sellings of the asset are excluded. The corresponding A trading strategy ϕ = (ϕ 0 n , ϕ n ) N −1 n=0 is said to be self-financing w.r.t. the initial capital x 0 if x 0 = ϕ 0 0 + ϕ 0 and X ϕ n = ϕ 0 n + ϕ n for all n = 1, . . . , N . It is easily seen that for any self-financing trading strategy ϕ = (ϕ 0 n , ϕ n ) N −1 n=0 w.r.t. x 0 the corresponding portfolio process admits the representation Note that X ϕ n − ϕ n corresponds to the amount of capital which is invested in the bond between time n and n + 1. Also note that it can be verified easily by means of Remark 3.1.6 in Bäuerle and Rieder (2011) that under condition (c) of Assumption (FM) the financial market introduced above is free of arbitrage opportunities.
In view of (22), we may and do identify a self-financing trading strategy w.r.t. x 0 with an (F n )-adapted R ≥0 -valued stochastic process ϕ = (ϕ n ) N −1 n=0 satisfying ϕ 0 ∈ [0, x 0 ] and ϕ n ∈ [0, X ϕ n ] for all n = 1, . . . , N − 1. We restrict ourselves to Markovian selffinancing trading strategies ϕ = (ϕ n ) N −1 n=0 w.r.t. x 0 which means that ϕ n only depends on n and X ϕ n . To put it another way, we assume that for any n = 0, . . . , N − 1 there exists some Borel measurable map f n : R ≥0 → R ≥0 such that ϕ n = f n (X ϕ n ). Then, in particular, X ϕ is an R ≥0 -valued (F n )-Markov process whose one-step transition probability at time n ∈ {0, . . . , N − 1} given state x ∈ R ≥0 and strategy ϕ = (ϕ n ) N −1 n=0 (resp. π = ( f n ) N −1 n=0 ) is given by m n+1 • η −1 n,(x, f n (x)) with η n,(x, f n (x)) (y) The agent's aim is to find a self-financing trading strategy ϕ = (ϕ n ) N −1 n=0 (resp. π = ( f n ) N −1 n=0 ) w.r.t. x 0 for which her expected utility of the discounted terminal wealth is maximized. We assume that the agent is risk averse and that her attitude towards risk is set via the power utility function u α : R ≥0 → R ≥0 defined by for some fixed α ∈ (0, 1) (as in Assumption (FM)). The coefficient α determines the degree of risk aversion of the agent: the smaller the coefficient α, the greater her risk aversion. Hence the agent is interested in those self-financing trading strategies In the following subsections we will assume for notational simplicity that r 1 , . . . , r N are fixed and that m 1 , . . . , m N are a sort of model parameters. In this case the factor (25) is superfluous; it indeed does not influence the maximization problem or any 'derivative' of the optimal value. On the other hand, if also the (Dirac-) distributions of r 1 , . . . , r N would be allowed to be variable, then this factor could matter for the derivative of the optimal value w.r.t. changes in the (deterministic) dynamics of B N .

Embedding into MDM, and optimal trading strategies
The setting introduced in Sect. 4.1 can be embedded into the setting of Sections 2-3 as follows. Let r 1 , . . . , r N ∈ R ≥1 be a priori fixed constants. Let for all x ∈ R ≥0 (in particular F n is independent of n). For any n = 0, . . . , N − 1, let the set F n of all admissible decision rules at time n be equal to F n . Let as before Π := F 0 × · · · × F N −1 .
Moreover let r n :≡ 0 for any n = 0, . . . , N − 1, and Consider the gauge function ψ : R ≥0 → R ≥1 defined by Let P ψ be the set of all transition functions P = (P n ) N −1 n=0 ∈ P consisting of transition kernels of the shape is the set of all μ ∈ M 1 (R ≥0 ) satisfying R ≥0 u α dμ < ∞, and the map η n,(x,a) is defined as in (23). In particular, P ψ ⊆ P ψ (with P ψ defined as in Sect. 3.3), and (1 − ε) P + ε Q ∈ P ψ for all P, Q ∈ P ψ and ε ∈ (0, 1) (i.e. P ψ is closed under mixtures). Moreover it can be verified easily that ψ given by (26) is a bounding function for the MDM (X, A, Q, Π, r) for any Q ∈ P ψ (see Lemma 7(i) of the supplemental article Kern et al. 2020). Note that X plays the role of the portfolio process X ϕ from Sect. 4.1. Also note that for some fixed x 0 ∈ R ≥0 , any self-financing trading strategy ϕ = (ϕ n ) N −1 n=0 w.r.t. x 0 may be identified with some π = ( f n ) N −1 n=0 ∈ Π via ϕ n = f n (X ϕ n ). Then, for every fixed x 0 ∈ R ≥0 and P ∈ P ψ the terminal wealth problem introduced in the second to last paragraph of Sect. 4.1 reads as A strategy π P ∈ Π is called an optimal (self-financing) trading strategy w.r.t. P (and x 0 ) if it solves the maximization problem (28).

Remark 8
In the setting of Sect. 4.1 we restrict ourselves to Markovian self-financing Of course, one could also assume that the decision rules of a trading strategy π also depend on past actions and past values of the portfolio process X ϕ . However, as already discussed in Remark 1(i), the corresponding historydependent trading strategies do not lead to an improved optimal value for the terminal wealth problem (28).

Computation of optimal trading strategies
In this subsection we discuss the existence and computation of solutions to the terminal wealth problem (28), maintaining the notation of Sect. 4.2. We will adapt the arguments of Section 4.2 in Bäuerle and Rieder (2011). As before r 1 , . . . , r N ∈ R ≥1 are fixed constants.
Basically the existence of an optimal trading strategy for the terminal wealth problem (28) can be ensured with the help of a suitable analogue of Theorem 4.2.2 in Bäuerle and Rieder (2011). In order to specify the optimal trading strategy explicitly one has to determine the local maximizers in the Bellman equation; see Theorem 2(i) in Section 6 of the supplemental article Kern et al. (2020). However this is not necessarily easy. On the other hand, part (ii) of Theorem 2 ahead (a variant of Theorem 4.2.6 in Bäuerle and Rieder 2011) shows that, for our particular choice of the utility function (recall (24)), the optimal investment in the asset at time n ∈ {0, . . . , N − 1} has a rather simple form insofar as it depends linearly on the wealth. The respective coefficient can be obtained by solving the one-stage optimization problem in (29) ahead. That is, instead of finding the optimal amount of capital (possibly depending on the wealth) to be invested in the asset, it suffices to find the optimal fraction of the wealth (being independent of the wealth itself) to be invested in the asset.
For the formulation of the one-stage optimization problem note that every transition function P ∈ P ψ is generated through (27) by some (m 1 , . . . , m N ) ∈ M α 1 (R ≥0 ) N . For every P ∈ P ψ , we use (m P 1 , . . . , m P N ) to denote any such set of 'parameters'. Now, consider for any P ∈ P ψ and n = 0, . . . , N − 1 the optimization problem v P;γ n := Note that 1 + γ (y/r n+1 − 1) lies in R ≥0 for any γ ∈ [0, 1] and y ∈ R ≥0 , and that the integral on the left-hand side (exists and) is finite (this follows from displays (34)-(36) in Subsection 5.1 of the supplemental article Kern et al. 2020) and should be seen as the expectation of u α (1 + γ (R n+1 /r n+1 − 1)) under P.
The following lemma, whose proof can be found in Subsection 5.1 of the supplemental article Kern et al. (2020), shows in particular that Part (i) of the following Theorem 2 involves the value function introduced in (7). In the present setting this function has a comparatively simple form: for any x n ∈ R ≥0 , P ∈ P ψ , and n = 0, . . . , N . Part (ii) involves the subset Π lin of Π which consists of all linear trading strategies, i.e. of all π ∈ Π of the form π In part (i) and elsewhere we use the convention that the product over the empty set is 1.
Theorem 2 (Optimal trading strategy) For any P ∈ P ψ the following two assertions hold.
(i) The value function V P n given by (30) admits the representation for any x n ∈ R ≥0 and n = 0, . . . , N − 1, where v P n := N −1 k=n v P k .
(ii) For any n = 0, . . . , N −1, let γ P n ∈ [0, 1] be the unique solution to the optimization problem (29) and define a decision rule f P n : R ≥0 → R ≥0 at time n through Then π P := ( f P n ) N −1 n=0 ∈ Π lin forms an optimal trading strategy w.r.t. P. Moreover, there is no further optimal trading strategy w.r.t. P which belongs to Π lin .
The proof of Theorem 2 can be found in Subsection 5.2 of the supplemental article Kern et al. (2020). The second assertion of part (ii) of Theorem 2 will be beneficial for part (ii) of Theorem 3 ahead; for details see Remark 9. The following two Examples 7 and 8 illustrate part (ii) of Theorem 2.
Example 7 (Cox-Ross-Rubinstein model) Let r 1 = · · · = r N = r for some r ∈ R ≥1 . Moreover let P ∈ P be any transition function defined as in (27) with m 1 = · · · = m N = m P for some m P := p P δ u P + (1 − p P )δ d P , where p P ∈ [0, 1] and d P , u P ∈ R >0 are some given constants (depending on P) satisfying d P < r < u P . Then P ∈ P ψ and conditions (a)-(c) of Assumption (FM) are clearly satisfied. In particular, the corresponding financial market is arbitrage-free and the optimization problem (29) simplifies to (up to the factor r −α ) Lemma 3 ensures that (33) has a unique solution, γ P CRR , and it can be checked easily (see, e.g., Bäuerle and Rieder (2011, p. 86)) that this solution admits the representation where κ α := (1 − α) −1 and Note that only fractions from the interval [0, 1] are admissible, and that the expression in the middle line in (34) lies in (0, 1) when p P ∈ ( p P,0 , p P,1 ). Thus, part (ii) of Theorem 2 shows that the strategy π P CRR defined by (32) (with γ P n replaced by γ P CRR ) is optimal w.r.t. P and unique among all π ∈ Π lin ( P).
In the following example the bond and the asset evolve according to the ordinary differential equation and the Itô stochastic differential equation dB t = νB t dt and dS t = μS t dt + σ S t dW t , respectively, where ν, μ ∈ R ≥0 and σ ∈ R >0 are constants and W is a one-dimensional standard Brownian motion. We assume that the trading period is (without loss of generality) the unit interval [0, 1] and that the bond and the asset can be traded only at N equidistant time points in [0, 1], namely at t N ,n := n/N , n = 0, . . . , N − 1. Then, in particular, the relative price changes r n+1 := B n+1 /B n = B t N ,n+1 /B t N ,n and R n+1 := S n+1 /S n = S t N ,n+1 /S t N ,n are given by respectively. In particular, r n+1 = exp(ν/N ) and R n+1 is distributed according to the log-normal distribution LN (μ−σ 2 /2)/N ,σ 2 /N for any n = 0, . . . , N − 1.

'Hadamard derivative' of the optimal value functional
Maintain the notation and terminology introduced in Sects. 4.1-4.3. In this subsection we will specify the 'Hadamard derivative' of the optimal value functional of the terminal wealth problem (28) at (fixed) P; see part (ii) of Theorem 3. Recall that α ∈ (0, 1) introduced in (24) is fixed and determines the degree of risk aversion of the agent.
By the choice of the gauge function ψ (see (26)) we may choose M := M := M Höl,α (with M Höl,α introduced in Example 5) in the setting of Sect. 3.5. Note that ψ coincides with the corresponding gauge function in Example 5 with x := 0. That is, in the end the metric d ψ ∞,M Höl,α (as defined in (11)) on P ψ is used to measure the distance between transition functions.
For the formulation of Theorem 3 recall from (14) the definition of the functionals V x 0 ;π 0 and V x 0 0 , where the maps V P;π 0 and V P 0 are given by (5) and (7), respectively. In the specific setting of Sect. 4.2 we know from (30) that for any x 0 ∈ R ≥0 , P ∈ P ψ , and π ∈ Π . Further recall that any γ = (γ n ) N −1 n=0 ∈ [0, 1] N induces a linear trading strategy π γ := ( f and V x 0 0 ) In the setting above let x 0 ∈ R ≥0 , γ ∈ [0, 1] N , and P ∈ P ψ . Then the following two assertions hold. (37) is 'Fréchet differentiable' at P w.r.t. (M Höl,α , ψ) with 'Fréchet derivative'V x 0 ;π γ 0; P : P P;± ψ → R given bẏ wherev P, Q;π γ 0 Remark 9 Basically Theorem 1 yields the first "=" in (39) with Π lin ( P) replaced by Π( P). Since part (ii) of Theorem 2 ensures that for any P ∈ P ψ there exists an optimal trading strategy which belongs to Π lin , we may replace for any P ∈ P ψ in the representation (30) of the value function V P 0 (x 0 ) (or, equivalently, in the representation (37) of the value functional V x 0 0 ( P)) the set Π by Π lin (⊆ Π ). Therefore one can use Theorem 1 to derive the first "=" in (39). The second "=" in (39) is ensured by the second assertion in part (ii) of Theorem 2. For details see the proof which is carried out in Subsection 5.3 of the supplemental article Kern et al. (2020).

Numerical examples for the 'Hadamard derivative'
In this subsection we quantify by means of the 'Hadamard derivative' (of the optimal value functional V x 0 0 ) the effect of incorporating an unlikely but significant jump in the dynamics S = (S 0 , . . . , S N ) of an asset price on the optimal value of the corresponding terminal wealth problem (28). At the end of this subsection we will also study the effect of incorporating more than one jump.
We specifically focus on the setting of the discretized Black-Scholes-Merton model from Example 8 with (mainly) N = 12. That is, we let r 1 = · · · = r N = r for r := exp(ν/N ), where ν ∈ R ≥0 . Moreover let P correspond to m 1 = · · · = m N = m P for m P := LN (μ P −σ 2 P /2)/N ,σ 2 P /N , where μ P ∈ R ≥0 and σ P ∈ R >0 are chosen such that μ P > (1 − α)σ 2 P . In fact we let specifically μ P = 0.05 and σ P = 0.2. This set of parameters is often used in numerical examples in the field of mathematical finance; see, e.g., Lemor et al. (2006, p. 898). For the initial state we choose x 0 = 1. For the drift ν of the bond we will consider different values, all of them lying in {0.01, 0.02, 0.03, 0.035, 0.04}. Moreover, we let (mainly) α ∈ {0.25, 0.5, 0.75}. Recall that α determines the degree of risk aversion of the agent; a small α corresponds to high risk aversion.
By a price jump at a fixed time n ∈ {0, . . . , N − 1} we mean that the asset's return R n+1 is not anymore drawn from m P but is given by a deterministic value Δ ∈ R ≥0 esstentially 'away' from 1. As appears from Table 1, in the case N = 12 it seems to be reasonable to speak of a 'jump' at least if Δ ≤ 0.8 or Δ ≥ 1.25. The probability under m P for a realized return smaller than 0.8 (resp. larger than 1.25) is smaller than 0.0001. A realized return of ≤ 0.5 (resp. ≥ 1.5) is practically impossible; its probability under m P is smaller than 10 −30 (resp. 10 −10 ). That is, the choice Δ = 0.5 or Δ = 1.5 doubtlessly corresponds to a significant price jump.
Remark 10 As mentioned before, the 'Hadamard derivative'V x 0 0; P evaluated at Q Δ,τ − P can be seen as the first-order sensitivity of the optimal value V x 0 0 ( P) w.r.t. a change of P to (1 − ε) P + ε Q Δ,τ , with ε > 0 small. It is a natural wish to compare these values for different Δ ∈ R ≥0 . In Subsection 5.4 of the supplemental article Kern et al. (2020) it is proven that the family { Q Δ,τ : Δ ∈ [0, δ]} is relatively compact w.r.t. d ψ ∞,M Höl,α (the proof does not work if d ψ ∞,M Höl,α is replaced by d φ ∞,M Höl,α for any gauge function φ 'flatter' than ψ) for any fixed δ ∈ R >0 (and τ ∈ {0, . . . , N − 1}, α ∈ (0, 1)). As a consequence the approximation (1) with Q = Q Δ,τ holds uniformly By Remark 10 and (41) we are able to compare the effect of incorporating different 'jumps' Δ in the dynamics S = (S 0 , . . . , S N ) of an asset price on the optimal value (functional) V x 0 0 ( P). As appears from Fig. 1 the negative effect of incorporating a 'jump' Δ = 0.5 in the dynamics S = (S 0 , . . . , S N ) of an asset price is larger than the positive effect of incorporating a 'jump' Δ = 1.5 for every choice of the agent's degree of risk aversion. Figure 1 also shows the unsurprising effect that a high risk aversion (small value of α) leads to a negligible sensitivity.
Next we compare the values ofV x 0 0; P ( Q Δ,τ − P) for trading horizons N ∈ {4, 12, 52} in dependence of the drift ν of the bond and the 'jump' Δ. This choices of N correspond respectively to a quarterly, monthly, and weekly time discretization. We will restrict ourselves to 'jumps' Δ ≤ 0.8. On the one hand, this ensures that the 'jumps' are significant; see the discussion above. On the other hand, as just discerned from Fig. 1, the effect of jumps 'down' are more significant than jumps 'up'.
From Fig. 2 one can see that for each trading time N and any Δ ∈ [0, 0.8] the (negative) effect of incorporating a 'jump' Δ in the dynamics S = (S 0 , . . . , S N ) of an asset price is the smaller the smaller the spread between the drift μ P of the asset and the drift ν of the bond. There is only a tiny (nearly invisible) difference between the 'Hadamard derivative'V x 0 0; P ( Q Δ,τ − P) for the trading times N ∈ {4, 12, 52}. So the fineness of the discretization seems to play a minor part.
Next we compare the values ofV x 0 0; P ( Q Δ,τ − P) for the drift ν ∈ {0.02, 0.03, 0.04} of the bond in dependence of the risk aversion parameter α and the 'jump' Δ. As appears from Fig. 3, for any Δ ∈ [0, 0.8] the (negative) effect of incorporating a 'jump' Δ in the dynamics S = (S 0 , . . . , S N ) of an asset price is the smaller the higher the agent's risk aversion, no matter what the drift ν ∈ {0.02, 0.03, 0.04} of the bond looks like. Take into account that the extent of this effect is influenced via (41)-(43) by the optimal fraction γ P BSM to be invested into the asset which in turn depends on the risk aversion parameter α (see (36)).

Supplement
The supplement Kern et al. (2020) illustrates the setting of Sects. 2-3 in the case of finite state space and finite action spaces, and contains the proofs of the results from Sects. 3-4. Moreover, supplemental definitions and results to Sect. 2 are given and the existence of optimal strategies in general MDMs is discussed. Finally, a supplemental topological result is shown.