Dynamically consistent investment under model uncertainty: the robust forward criteria

We combine forward investment performance processes and ambiguity-averse portfolio selection. We introduce robust forward criteria which address ambiguity in the specification of the model, the risk preferences and the investment horizon. They encode the evolution of dynamically consistent ambiguity-averse preferences. We focus on establishing dual characterisations of the robust forward criteria, which is advantageous as the dual problem amounts to the search for an infimum whereas the primal problem features a saddle point. Our approach to duality builds on ideas developed in Schied (Finance Stoch. 11:107–129, 2007) and Žitković (Ann. Appl. Probab. 19:2176–2210, 2009). We also study in detail the so-called time-monotone criteria. We solve explicitly the example of an investor who starts with logarithmic utility and applies a quadratic penalty function. Such an investor builds a dynamic estimate of the market price of risk λˆ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\hat{\lambda}$\end{document} and updates her stochastic utility in accordance with the so-perceived elapsed market opportunities. We show that this leads to a time-consistent optimal investment policy given by a fractional Kelly strategy associated with λˆ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\hat{\lambda}$\end{document} and with the leverage being proportional to the investor’s confidence in her estimate.

dynamic estimate of the market price of riskλ and updates her stochastic utility in accordance with the so-perceived elapsed market opportunities. We show that this leads to a time-consistent optimal investment policy given by a fractional Kelly strategy associated withλ and with the leverage being proportional to the investor's confidence in her estimate.

Introduction
This paper is a contribution to optimal investment as a problem of normative decisions under uncertainty. This topic is central to financial economics and mathematical finance, and the relevant body of research is large and diverse. Within it, expected utility maximisation (EUM), with its axiomatic foundation going back to von Neumann and Morgenstern [68] and Savage [61], is probably the most widely used and extensively studied framework. In a continuous-time setting, it was first applied to the optimal portfolio selection by Merton [49] who proposed a stochastic optimisation problem of the form max π E P [U(X π T )], (1.1) where P is the historical probability measure, T the trading horizon and U( · ) the investor's utility function at T . Despite the popularity of the above model, there has been a considerable amount of criticism of the model fundamentals (P, T , U), for these inputs might be ambiguous, inflexible, not very amenable to applications, and difficult to specify. First, there are numerous issues regarding elucidation and choice of the utility function U . Some authors argue that the concept of utility per se is elusive and that one should look for different, more pragmatic criteria to use in order to quantify the risk preferences of an investor. We refer the reader to an old note of F. Black [8] where the criterion is the choice of the optimal portfolio, see also He and Huang [29] and Cox et al. [13], and to Monin [50] where the criterion is a targeted wealth distribution. Another line of research accepts the utility as an appropriate device to rank outcomes but challenges the classical EUM, for empirical evidence shows that investors feel differently with respect to gains and losses. Among others, see Hershey and Schoemaker [33] and Kahneman and Tversky [35] which then led to the development of the area of behavioural finance (see e.g. Barberis and Thaler [4] and Jin and Zhou [34]). Yet others generalise the concept of utility and move away from terminal-horizon deterministic utilities, as U( · ) above, by allowing state-and path-dependence which can alleviate several drawbacks of the classical setting. One of the best known paradigms are recursive utilities; see e.g. Duffie and Epstein [18], El Karoui et al. [22], Skiadas [67]. State-dependent utilities have also been considered in static frameworks; see e.g. Drèze [17] and Karni [41].
Second, the investment horizon T might not be fixed or a priori known. Such situations arise, for example, in investment problems with rolling horizons or in problems in which the horizon needs to be modified due to an inflow of new funds, new market opportunities, or new investment options and obligations. In this context, it is natural to study under which model conditions and preference structures one could extend the standard investment problem beyond a pre-specified horizon in a time-consistent manner; see e.g. Källblad [36,Sect. 2.2] and [37]. It is also interesting to study utilities that are not biased by the horizon choice, like the horizon-unbiased utilities introduced by Henderson and Hobson [30]; see also Choulli et al. [12].
Last but not least, an investor frequently faces significant ambiguity as to which market model to use; specifically, how to determine the probability measure P. This is often referred to as Knightian uncertainty in reference to the original contribution of Knight [44]. In the seminal work by Gilboa and Schmeidler [28], motivated by the Ellsberg [23] paradox, the independence axiom was weakened to account for ambiguity aversion which led to a generalised robust EUM paradigm. It built on earlier contributions, including Anscombe and Aumann [2] and Schmeidler [66], and has since been followed and extended in a large number of works; we refer the reader to Maccheroni et al. [47], Schied [64] and to Föllmer et al. [27] and the references therein.
Our work here was motivated by the above considerations on the triplet of model inputs (P, T , U). We propose a framework that alleviates some of the above shortcomings in a unified manner, combining elements from classical robustness theory and the recently developed forward investment performance approach. We now briefly introduce the latter before describing our main contributions.
In the absence of model uncertainty, Musiela and Zariphopoulou [53,54] introduced the forward performance process as an adapted stochastic criterion parametrised by wealth and time, denoted by U(x, t), t ≥ 0, and constructed "forward in time". Specifically, given today's profile U(x, t), the forward process U(x, T ) for an arbitrary upcoming investment horizon T > t is specified so that for any admissible π, for the optimal π * .
This allows considerable flexibility in incorporating changing market opportunities and investors' attitudes in a dynamically consistent manner. In contrast, in the classical formulation, the value function is constructed in a similar manner but in the opposite time direction: the utility criterion is first chosen at the end of the horizon and then the dynamic programming principle generates the solution from T to previous times. The computation of the value function involves the underlying model for market dynamics for the entire investment period and there is no a priori mechanism to extend the investment problem beyond T in a dynamically consistent manner. This induces significant limitations, as discussed below in our motivating example in Sect. 2.1.
In this paper, we build an analogous decision framework for an agent who faces model ambiguity. As in the classical robust EUM, we consider an investor in a stochastic market environment for which she does not know the "true" model. Instead, she describes the market reality through relative weighting of stochastic models with some models being more likely than others, some being excluded altogether, etc. These views are expressed by a penalty function and are updated dynamically in time. The investor's personal evaluation of wealth is expressed through her preferences. When considering a given investment horizon, say T , the investor aims to maximise the robust expected utility (max-min) functional, similarly to Maccheroni et al. [47] and Schied [64]. However, we generalise their criterion by considering stochastic preferences. These preferences evolve forward in time, taking into account the model ambiguity, and are defined for all investment horizons. Accordingly, we call them robust forward criteria. They are encoded by pairs of utility fields and penalty functions which are dynamically consistent.
Our theoretical focus is on defining and further characterising the new investment criteria. We consider their duals and establish an appropriate duality theory. Similarly to Schied [64], as well as Quenez [60] and Schied and Wu [65], the proof of duality proceeds by using an appropriate minimax theorem and then applying a model-specific duality result to the inner maximisation. However, unlike [64] which relied on results of Kramkov and Schachermayer [45], we view the inner maximisation problems under the fixed reference measure P but featuring stochastic utility functions and apply the duality in Žitković [69]. Our proofs involve a number of technical and conceptual novelties. In particular, we prove relevant conjugacy relations and the existence of a dual optimiser for a class of utility functions which are allowed to be stochastic and finite on the entire real line. Notably, the dynamic consistency conditions are imposed jointly on the penalty function and the utility random field. Unlike for convex risk measures or the classical EUM, the dynamic aspects of robust portfolio optimisation seem to have been studied only for specific examples; see e.g. Laeven and Stadje [46] and Müller [51,Chap. 7]. We provide general results which in particular highlight the necessity of a conditional stability property of the penalty functions, see property (2.11) below, in the past only considered for dynamic risk measures. Further, we also obtain the equivalence between dynamic consistency in the primal and dual domain and characterise the latter via a suitable submartingale property. While these are natural properties which are well understood in other contexts, e.g. classical EUM, they appear to be novel in the context of robust portfolio optimisation. We use the dual formulation to study the question of time-consistency of the optimal strategies. We show that in general, both in our framework as well as in the classical robust EUM, the optimal strategies may fail to be time-consistent. This is caused by possibly arbitrary dynamics of the penalty functions. We show that time-consistency of the optimal strategies is guaranteed under suitable assumptions of dynamic consistency of the penalty functions.
Apart from the theoretical contribution, we also construct and solve explicitly some practically relevant examples which showcase the advantages of our approach. Most notably, we consider an investor who starts with a logarithmic utility and applies a quadratic penalty function. The investor then builds a dynamic estimate of the market price of risk, sayλ, and updates her stochastic utility in accordance with the soperceived elapsed market opportunities. We show that this leads to a time-consistent optimal investment policy given by a fractional Kelly strategy associated withλ. The leverage is a function of the investor's confidence in the estimateλ. This solution is both intuitive and relevant since it corresponds to strategies often followed by large investors in practice. In the classical robust EUM approach, for a fixed time interval [0, T ], such behaviour is consistent with the simplest setting of a complete market and constant penalty weighting and is essentially the only explicit example available with the classical approach; see Hernández-Hernández and Schied [32]. In a more complex setting -e.g. incomplete market or general adapted penalty weights -, this structure is lost; the solution is described via PDE or BSDE methods and the optimal investment strategies may depend on the setting and on the investment horizon T . This complexity is due to the entangled nature of solving the problem backwards and having a deterministic boundary constraint at T . Our approach, in contrast, does not suffer from such drawbacks and offers a solution which holds in great generality. We discuss this in detail in Sect. 2.1. A further example of an investor initially endowed with an exponential utility is studied in Sect. 2.2. In Sect. 5, we discuss the structure of forward criteria and identify some particular classes or robust forward criteriathis provides us with yet some further examples.
The rest of the paper is organised as follows. In Sect. 2, the market assumptions are specified, the robust forward criteria are introduced and motivating examples are studied. In Sect. 3, equivalent dual characterisations of robust forward criteria are established. Then, in Sect. 4, we study the link between dynamic consistency of penalty functions and time-consistency of optimal investment strategies. In particular, we discuss a simple example of criteria leading to time-inconsistent optimal investment strategies. Section 5 is devoted to a mostly formal discussion of various classes of criteria. Our aim is to illustrate the flexibility of the notion and the fact that interesting preferences might be identified under additional evolutionary requirements. In particular, time-monotone criteria are linked to a specific PDE. We also argue that for each robust forward criterion, there exists a specific (standard) forward criterion in the reference market producing the same optimal behaviour. The proofs are deferred to Sect. 6.

Robust forward criteria: motivation and definition
In order to motivate and illustrate the upcoming definition, we first consider two examples. In Sect. 2.1, we build a robust forward criterion which combines logarithmic preferences with a quadratic penalty structure for model ambiguity. The example is of particular interest as it gives theoretical justification for fractional Kelly strategies which are often used in practice. Subsequently, in Sect. 2.2, we consider an example with initial exponential preferences. In Sect. 2.3, we then introduce the general setup and definition.

A motivating example: robust forward criteria yielding fractional Kelly strategies
Consider a probability space ( , F, F, P) with the filtration spanned by a twodimensional P-Brownian motion (Ŵ t ) t≥0 = (Ŵ 1 t ,Ŵ 2 t ) t≥0 and a market with a zero-interest bond and a stock whose price process (S t ) t≥0 solves for some F-progressively measurable processesλ and σ > 0. An investor acting in this incomplete market chooses the number of shares, denoted by (π t ) t≥0 , to buy of the risky asset. Her wealth process then follows the dynamics The set of admissible strategies, starting from wealth x at time t ≥ 0, is given by is predictable with (X π s ) s≥t well defined with X π t = x and X π s > 0 a.s. for all s ≥ t}.
Before we introduce model uncertainty, let us discuss this simple setup to highlight the differences between the classical problem (1.1) and the forward performance criteria. An investor solving (1.1) with a time horizon T and utility function U(x) = ln x is myopic and simply follows the growth-optimal or Kelly [42] strategy which invests the fraction of wealthλ t /σ t in the risky asset, π * t =λ t σ t S t X π * t ; see Bansal and Lehmann [3] and Kardaras et al. [40] and the references therein for details. While π * does not rely on T , or on the particular dynamics ofλ in the future, the value function of the investor with wealth x at time t very much does and is given by In contrast, the analogous time-monotone forward performance process, which generates the same optimal investment strategy, is given by which puts value in the context of the elapsed market opportunities instead. This allows considerable flexibility in reassessing the upcoming market evolution in a dynamically consistent manner. Crucially, as we show below, this setup behaves much more naturally when model uncertainty is introduced. Suppose now that the investor acknowledges model ambiguity. She builds, and updates dynamically, her best estimate P (or equivalentlyλ) of reality, but she is aware that it might be inaccurate. So the investor considers various other models and quantifies their relative likelihood via a penalty function γ . To make the setup precise, when making decisions over the interval [t, T ], we only consider measures Q ∼ P on F T . We denote by P the set of all F-progressively measurable processes (ν t ) t≥0 with T 0 |ν t | 2 dt < ∞ a.s. for all T > 0. Any measure Q ∼ P on F T may then be identified with a process η = (η 1 , η 2 ) ∈ P × P, via dQ d P | F T = D η T , with the martingale (D η t ) t≥0 given by we write Q = Q η . For the present example, we assign it the penalty for some adapted nonnegative process (δ t ) which controls the strength of the penalisation (cf. also (5.3) below); that is, (δ t ) quantifies 1 the investor's trust in the estimate P. Note that it is natural to expect γ t,T ( · )(ω) to have a global minimum at P| F T . We let Q t,T denote the set of Q η with a.s. finite penalty at time t. Finally, we assume that there exists κ > 1/2 such thatÊ[exp(κ T 0λ 2 s ds)] < ∞ for all T > 0; this is a convenient integrability assumption which can be interpreted as P being reasonable. We then have the following result, the proof of which is reported in Sect. 6. Proposition 2.1 Given the investor's choice of (λ t ) and (δ t ) as above, let

4)
and Recall that the penalty γ is given by (2.3). Then for all 0 ≤ t ≤ T < ∞,

6)
and the optimum is attained for the saddle point (η,π) given in (2.4).
The investment strategy given in (2.4) corresponds to strategies used in practice by some of the large fund managers. Specifically, it is a fractional Kelly strategy where the investor invests in the growth optimal (Kelly) portfolio corresponding to her best estimate of the market price of riskλ. However, she is not fully invested but instead chooses a leverage 2 proportional to her trust in the estimateλ. If δ t ∞ (infinite trust in the estimation), thenπ t S t /Xπ t λ t /σ t which is the Kelly strategy associated with the most likely model P. On the other hand, if δ t 0 (no trust in the estimation), thenπ t 0 and the optimal behaviour is to invest nothing. We stress thatλ and δ are 1 For δ t ≡ δ constant, the penalty function in (2.3) corresponds to the entropic penalty γ (Q) = δH (Q| P), for which the optimisation problem in (2.6) may be reformulated as a pure maximisation problem with a modified utility function (if considering utility from intertemporal consumption, such penalty functions still yield non-trivial problems; see among others [9,67]). For (δ t ) being a general process, the situation is however different. 2 In practice, the leverage has often a risk interpretation, e.g. it is adjusted to achieve a targeted level of volatility for the fund. It is adjusted rarely in comparison to the dynamic updating of the estimateλ. Similarly, in our framework, the trust in one's estimation methods is likely to be adjusted on a much slower scale than the changes to the estimate itself. the investor's arbitrary inputs. In particular, there is no assumption thatλ is a good estimate of some "true" market price of risk λ. For the dynamic consistency (2.6), it is only crucial that the investor's utility function (2.5) evolves in function of the investor's perception of the market. The above solution is intuitive, practically relevant and robust. It is insightful to compare it with the classical robust EUM framework. The latter would fix an investment horizon T and take U(x, T ) = ln x with (2.6) defining the value field for t ≤ T . For some simple setups, e.g. a complete market with δ t ≡ δ, this would lead to the same optimal investment strategyπ as in (2.4); cf. Hernández-Hernández and Schied [32]. However, in more general setups, the optimal strategy would not be explicit, and would depend on T and the set of measures Q t,T in a complex way; see e.g. [32,46] and [51,Chap. 7]. This is due to the requirement to match a pre-specified deterministic utility at a future target date, which implies that the robust EUM entangles model ambiguity with horizon specification in a rather complex way leading to a loss of the intuitive structure of the solution. There are further important advantages of our approach. The classical robust EUM would result in a value function which is defined on [0, T ] and has a non-trivial volatility, while (2.5) is defined for all time horizons simultaneously and is monotone in time; see Sect. 5 for a further discussion of such structural properties.
We believe that the above example showcases the advantages of our approach over the classical robust EUM. More generally, our idea behind the robust forward criteria is to take the condition (2.6) of dynamic consistency as the defining property, and to study the corresponding class of investment criteria: we say that a pair of mappings, namely a utility (random) field U : × [0, ∞) × R → R and a penalty function γ : {Q ∼ P} → R, is a robust forward criterion if they satisfy this property for all 0 ≤ t ≤ T < ∞; see Definition 2.6 below for the formal definition. This class of preferences provides dynamically consistent investment criteria which are well defined for all investment horizons. We note that with this terminology, the pair (U, γ ) defined in Proposition 2.1 is a robust forward criterion for which the fractional Kelly strategy is optimal.

Second example: robust forward criteria for wealth on R
In our motivating example studied above, wealth was assumed to be positive. We now present a second explicit example where wealth is allowed to become negative, which will be the setup of our abstract definitions in Sect. 2.3. The example starts with wellstudied and canonical choices in economics: preferences which exhibit a constant absolute risk aversion and a multiple-prior (coherent) penalty originally derived via an axiomatic approach to preferences by Gilboa and Schmeidler [28]. The underlying setup is the same as in the previous example, with the investor's best estimate of the market denoted by P under which the underlying (incomplete) market is specified via (2.1) and we assume thatÊ[exp(2 u 0λ 2 s ds)] < ∞ for all u > 0. The investor's trust in her current estimation is now described through a predictable process α, with 0 ≤ α t ≤λ t , in that she considers all models for which the market price of risk is at most α away from the current best estimateλ. That is, for the investment interval [t, T ], the investor considers the set of models Q t,T = Q η ∼ P| F T : η = (η 1 , 0) and where we assume for simplicity 3 that the investor is confident about her modelling of the market factorŴ 2 . In practice,λ t is likely to be estimated using statistical methods, and we may think of α t as the width of the confidence interval. We consider a coherent penalty function (γ t,T ) 0≤t≤T , assigning the penalty γ t,T (Q η ) = 0 for Q η ∈ Q t,T and γ t,T (Q η ) = ∞ otherwise. We consider at t ≥ 0 the class of admissible strategies A x t := π : (π s ) is predictable with (X π s ) s≥t well defined with X π t = x, and where the latter part imposes an integrability condition in each market model the investor considers plausible. We note that in Sect. 3, when proving general duality results, we do not investigate existence of optimal strategies and therefore simply restrict to bounded wealth processes. The parameter a > 0 is effectively used to model the investor's risk aversion (cf. (2.9) below); the more risk-averse the investor is, the smaller her set of available trading strategies. 4 The following result is proved in Sect. 6.

Proposition 2.2
Given the investor's choice of (λ t ) and (α t ) as above, suppose that 0 ≤ α t ≤λ t , t ≥ 0, and let Recall that the penalty γ is of entropic type with Q t,T given by (2.7). Then for all

10)
and the optimum is attained for the saddle point (η,π) given in (2.8).
Equation (2.10) is a dynamic consistency relation which, as argued above, will be the defining property for our robust decision criteria. Indeed, the pair (U, γ ) is a robust forward criterion as defined below in Definition 2.6. The choice of a penalty of multiple-priors type means that all measures are considered equally likely, and in consequence the strategy is adjusted to the worst-case scenario. Specifically, the investor invests an amount proportional to the Sharpe ratio in the worst market model among the ones she considers plausible, with the proportion depending on the risk aversion: the higher the risk aversion, the less the amount invested. Similar results were obtained in the case of a complete market setup in [63]. Robust EUM with exponential utilities and multiple-priors preferences has also been studied by use of stochastic control methods in [46,58] and [51,Sect. 7.2]. In contrast to the these studies, and in analogy to the logarithmic example in Sect. 2.1, in our setup the natural behaviour in [63] extends to more general markets. As before, this is possible since we disentangle the ambiguity of model selection from the horizon specification.

Definition of robust forward performance criteria
We now turn to a general market setup and define the robust forward criteria.

The underlying market assumptions
The market consists of d + 1 securities whose prices satisfies the usual conditions. We let S 0 ≡ 1 and assume S to be locally bounded. A portfolio process π = (π t ) t∈[0,∞) is an F-predictable process which is S-integrable on [0, T ] for each T > 0 and denotes the number of shares held in the risky asset. The associated wealth process X π is given by The set of admissible portfolio processes available to the investor is denoted by A and is typically a subset of all portfolio processes. For each T > 0, M e T denotes the set of equivalent local martingale measures, that is, the set of measures Q on F T such that Q ∼ P| F T and each component of S is a Q-local martingale. Similarly, M a T denotes the set of absolutely continuous local martingale measures. The corresponding sets of density processes are denoted, respectively, by Z e T and Z a T . Put differently, and similarly for Z a T . For any nonnegative martingale Z t , t ≤ T , and in particular for density processes in Z a T , we use the notation Z s,t : We impose the following assumption throughout: This assumption is referred to as the absence of arbitrage (NFLVR) on finite horizons; see [69,Sect. 2] for further discussion. Note that while there need not exist a set M e of probability measures equivalent to P such that M e T = {Q| F T : Q ∈ M e } for all T > 0. As argued in [69], the condition NFLVR on finite horizons implies that any density process Z ∈ Z e T can be extended to a strictly positive martingale (Z t ) t∈[0,∞) such that Z 0 = 1 and ZS is a local martingale. The set of all such processes Z is denoted by Z e . In particular, NFLVR on finite horizons holds if and only if Z e is nonempty. If the condition of strict positivity is replaced by the one of nonnegativity, the obtained family is denoted by Z a .

Utility random fields and penalty functions
The robust forward criteria which we introduce below combine two elements: a utility random field U(ω, x, t), t ≥ 0, and a family of penalty functions γ t,T (Q), 0 ≤ t ≤ T < ∞. The component U(ω, · , t) models the preferences at time t and may depend on past observations. In addition, the investor faces ambiguity about the "true model" for the dynamics of the financial assets and forms a view about the relative plausibility of different probability measures; this is reflected in γ t,T (Q)(ω) which gives the weighting of measures Q on F T . From now on, we focus on the case of U defined on R; this simplifies some aspects of the duality theory, as explained in Sect. 3 below. Alterations of our abstract definitions to the case of U on R + are immediate.

Definition 2.4
A random field is a mapping U : × R × [0, ∞) → R which is measurable with respect to the product of the optional σ -algebra on ×[0, ∞) and B(R). A utility random field is a random field which satisfies the following conditions: s. a strictly concave and strictly increasing C 1 (R)-function which satisfies the Inada conditions In what follows, we suppress ω from the notation and simply write U(x, t).
The penalty function γ t,T ( · ) should not distinguish between two probability measures Q 1 , Q 2 ∼ P| F T which agree at time t when considering the horizon [t, T ], i.e., This means that γ t,T (Q) is a function of the conditional density , and we con- . This justifies the following definition.
+∞]-valued random variables which satisfies the following conditions: Moreover, for a given utility random field U(x, t) and a set of admissible strategies A, we say that We note that conditional convexity (i) above readily implies (2.11). Conversely, (2.11) together with convexity only for deterministic λ implies conditional convexity for simple λ ∈ L 0 (F t ), which then yields (i) by using the continuity in (ii).
Condition (2.11) above simply says that if at time t an investor considering [t, T ] cannot tell apart Q 1 from Q 2 , then she assigns them the same penalty. To the best of our knowledge, such a condition has previously not been invoked in the context of robust portfolio optimisation, but it is required here since unlike previous works, we consider a dynamic problem and prove conditional conjugacy relations. Analogous conditions have appeared before in the context of dynamic risk measures; see Definition 3.11 of the local property of penalty functions in Cheridito et al. [11] or the pasting property in Lemma 3.3 in Klöppel and Schweizer [43]. Its importance here becomes apparent in the proof of Lemma 6.3.
In the above definition, Q t,T is the set of feasible measures considered at time t when investing over [t, T ]. It may depend on t and T but is non-random. Both larger and smaller sets could be used, e.g. the (random) set of measures Q with γ t,T (Q)(ω) < ∞ or the set of measures Q with E[γ t,T (Q)] < ∞. However, for many natural penalty functions, these different choices lead to the same value function. Finally, note that we do not impose any regularity or consistency assumptions on γ t,T (Q) in the time variables. These are not necessary for the abstract results in Sect. 3 and will be introduced later when they appear naturally; see Assumption 4.1.

Robust forward performance criteria
We are now ready to introduce the robust forward criteria. As highlighted above, these are pairs (U, γ ) which exhibit a dynamic consistency akin to the dynamic programming principle.

Definition 2.6
Let U be a utility random field, A a set of admissible strategies and γ an admissible family of penalty functions. We say that (U, γ ) is a robust forward criterion if for all 0 ≤ t ≤ T < ∞ and all ξ ∈ L ∞ (F t ), We note that the above definition is well posed. Indeed, given the assumptions on U and γ , the conditional expectations in (2.12) are well-defined (extended-valued) random variables. Since all Q ∈ Q t,T are equivalent to P, for each π ∈ A, the essential infimum is also well defined (extended-valued) with respect to the reference measure P. The set of admissible strategies A which we consider is specified below; more generally, and in particular if U were defined on R + , one might need to take an A which depends on (ξ, t). Naturally, we call an admissible strategy optimal if it attains the supremum in (2.12). However, our definition of robust forward criteria does not require the existence of optimal investment strategies. In that aspect, we follow the approach in [69] rather than the original definition in [53,54]. This is particularly helpful for the duality theory developed in Sect. 3.
Example 2.7 An example of a robust forward criterion as in Definition 2.6 is given by the pair (U, γ ) considered in Proposition 2.2. The pair (U, γ ) in Proposition 2.1 is an example corresponding to an analogous definition, but for the case of random fields defined on R + . We discuss further examples below; see in particular Sect. 5.
The optimisation in (2.12) fits within the robust EUM paradigm as discussed in the introduction. The crucial difference is that we require (2.12) to hold for all time pairs t ≤ T . We refer to (2.12) as the dynamic consistency property of (U, γ ); allowing model ambiguity, it provides a direct extension of the notion of self-generating utility fields studied in [69] and, consequently, of the notion of forward performance criteria; see the introduction and Sect. 5.
To relate (2.12) to the more classical dynamic programming principle, it is useful to introduce the family of value functions {u( · ; t, T ) : This then implies a familiar DPP (or martingale optimality principle), namely The setting of (2.13) corresponds to a very general robust EUM, but we note that it also has its limitations. For example, the penalty γ t,T (Q) associated to a given measure is fixed and independent of wealth. This has important implications for the time-consistency of optimal investment strategies. Indeed, as we show in Proposition 4.4, when the (γ t,T ) are dynamically consistent and if we have saddle points (π t,T , Q t,T ) solving (2.13), then Q t,r = Q t,T | F r , t ≤ r ≤ T , and also the optimal investment strategies are time-consistent. However, in all generality, we could have (dynamically consistent) robust forward criteria which lead to time-inconsistent optimal strategies -an example is given in Sect. 4. Independence of γ t,T (Q) from the investor's wealth is also contrary to the empirical evidence as discussed in behavioural finance, see e.g. Kahneman and Tversky [35], which points to the importance of the investor's reference point for judging scenarios. In consequence, we believe that it might be interesting to study generalisations of the problem in (2.13). Within the framework of robust EUM, these are possible using quasi-concave utility functionals introduced in Cerreia-Vioglio et al. [10]. Their use for the (classical) optimal investment problem has recently been investigated by Källblad [38].

Dual characterisation of robust forward criteria
Dual methods have proved useful for the study of optimal investment problems and this applies also within our setup. In particular, while the primal problem features a saddle point, the dual problem amounts to the search for a pure infimum, and robust forward criteria are therefore easier to characterise in the dual rather than the primal domain. The aim of this section is to establish the equivalence between dynamic consistency in the primal and the dual domain.
We focus on utility random fields which are finite on the entire real line. The reasons are twofold. First, we complement the work of Schied [64] where only utilities defined on the positive half-line were studied. Second, this simplifies certain technical aspects, see also e.g. [25], and allows us to focus on the novelty of our setting. We note that allowing negative wealth usually complicates the choice of an appropriate set of admissible strategies yielding the existence of an optimiser; cf. [59,62]. This is not a concern for us since we do not require the existence of a primal optimiser, and hence, without loss of generality, we can restrict to the set of bounded wealth processes. 5 Accordingly, we set in Definitions 2.4 and 2.6 A = A bd , the set of all portfolios producing bounded wealth processes. Specifically, A bd =Ā ∩ (−Ā), whereĀ is the set of all admissible portfolio processes for which for any T > 0, there exists a constant c > 0 such that Given a utility random field U , the associated dual random field, denoted by The notion of dynamic consistency in the dual domain is then defined as follows.
consisting of a dual random field and a family of penalty functions is dynamically consistent For later use, we also introduce the dual value field. For any It follows that a pair (V , γ ) consisting of a dual random field and a family of penalty functions is dynamically consistent if, and only if, for all 0 ≤ t ≤ T < ∞ and all

Equivalence between primal and dual dynamic consistency
We first introduce the following technical assumption.
T } is uniformly integrable for any x ∈ R. In addition, for any Q ∈ Q t,T and any nonincreasing sequence (D n ) n∈N in F T with n D n = ∅, there exists a sequence (a n ) n∈N in (0, ∞) such that a n → ∞ and lim inf For measures Q ∈ Q t,T such that Z Q t,T U(x, T ) ∈ L 1 for some and hence for all x ∈ R, although seemingly weaker, the second part of the above assumption is equivalent to the fact that the stochastic utility function Z Q t,T U( · , T ) satisfies the non-singularity condition in Definition 3.3 in [69]. This is a mild technical assumption which precludes pathological appearances of non-countably additive measures in the dual treatment. In particular, it is satisfied whenever the utility field is (x, ω)-uniformly bounded from below by a deterministic utility function; see [69,Remark 3.4]. We also note that since {Z Q t,T : Q ∈ Q t,T } is convex, weak compactness is equivalent to closedness in L 0 ; cf. [65,Lemma 3.2].
Next, we present the first main result, which yields the conjugacy relations between the functions u(x; t, T ) and v(y; t, T ). We stress that even for t = 0, Theorem 3.3 differs from Theorem 2.4 in [64] in that U( · , T ) is defined on the entire real line and allowed to be stochastic, and, moreover, we do not impose any finiteness assumptions. The proof is reported in Sect. 6.1. (U, γ ) be a pair of a utility random field and an admissible family of penalty functions, suppose that Assumption 3.2 holds, and let V be the associated dual random field. Then for all 0 ≤ t ≤ T < ∞, ξ ∈ L ∞ (F t ) and η ∈ L 0 + (F t ), the associated value fields satisfy

In consequence, the combination of a utility random field U(x, t) and a family of penalty functions γ t,T is dynamically consistent if and only if the combination of the dual random field V (y, t) and γ t,T is dynamically consistent.
The next result shows that the dual problem admits a solution even though the primal problem need not (since we have restricted to the use of bounded wealth strategies). (U, γ ) be a pair of a utility random field and an admissible family of penalty functions, and let V be the associated dual random field. Suppose that Assumption 3.2 holds. Then for any t ≤ T < ∞ and η ∈ L 1 + (F t ), there exist Q ∈ Q t,T and Z ∈ Z a T attaining the infimum in (3.2).

Proposition 3.4 Let
We provide the proof in Sect. 6.1, but remark that the fact that the second component of the optimiser lies in M a T (as opposed to the larger set of finitely additive measures) is a consequence of the utility function being finite on the entire real line (see [69] and also [5,62]).
We work here under the assumption that the measures in Q t,T are equivalent to the reference measure. However, under the convention that Z Q t,T V (ηZ t,T /Z Q t,T ) = ∞ on {Z Q t,T = 0}, our proofs go through with straightforward modifications also when allowing Q t,T to include all measures absolutely continuous with respect to the reference measure with finite penalty a.s.; cf. [38] for similar results in the case of (static) utility functions defined on the positive half-line. We also expect that our proofs might be further developed so as to rely on weak compactness of level sets of the form {Z Q t,T : Q P| F T and γ t,T (Q) ≤ ξ a.s.}, ξ ∈ L ∞ (F t ), rather than of Q t,T . We leave this topic for future research.

Dynamic consistency of penalty functions and time-consistency of optimal investment strategies
The definition of robust forward criteria requires the combined criterion consisting of U(x, t) and γ t,T to be dynamically consistent (cf. Definition 2.6). In this section, we further investigate this assumption and relate it to dynamic consistency of the penalty functions and time-consistency of the optimal investment strategies. The corresponding proofs are reported in Sect. 6.2. Moreover, For any penalty function satisfying (4.1), Q t,T ⊆Q t,T . However, in general, stability under pasting (4.2) may fail. It may be recovered if different definitions of Q t,T are used, e.g. with measures satisfying E[γ t,T (Q)] < ∞; see the remarks below on penalty functions associated with risk measures.
The additional structure resulting from Assumption 4.1 allows us to consider the question of whether for a fixed T > 0, the value field u(x; t, T ) associated with a general utility field satisfies itself the dynamic programming principle (2.14) for t ≤ T . We show that under suitable assumptions on the penalty function, this is the case. For particular choices of preferences, this property has been used to address the ambiguity-averse problem by stochastic control methods in [31,32,51]. The proof proceeds by first establishing appropriate consistency in the dual domain and then applying Theorem 3.3. For the case of standard (non-robust) utility maximisation and deterministic utility functions, it is well known that the value process satisfies the DPP, also referred to as the martingale optimality principle; see [19,Chap. I]. Proposition 4.2 shows that a similar consistency property holds for certain ambiguity-averse criteria. However, the value field associated with a general penalty function may fail to be dynamically consistent; see [64] for counterexamples. Hence, while the standard forward criteria are effectively a generalisation (to all positive times) of the value functions associated with (stochastic) utility functions, within the robust setting, our Definition 2.6 enforces additional structure by imposing the dynamic consistency requirement (2.12) on the pair (U, γ ). In general, however, this is weaker than the assumption of dynamic consistency of γ . Indeed, as illustrated by the next example, there are dynamically consistent pairs (U, γ ) where the penalty function γ itself is not dynamically consistent. Such robust forward criteria may lead to time-inconsistent optimal investment strategies.
Example 4. 3 We work in the setting of Sect. 2.1. We setλ ≡ 0 and fix a family of bounded random variables (λ t,T ) with 0 ≤ t ≤ T , with each λ t,T being F t -measurable and (λ t,T ) 2 ≤ K, for some K > 0. In turn, let Let U(x, t) := ln x − t 2 K and η t,T u := 0 for u < t and η t,T u := (λ t,T , 0) for t ≤ u ≤ T . By definition, Q t,T = {Q η t,T } and therefore, using classical results on logarithmic utility maximisation, we have that We easily conclude that (U, γ ) is a robust forward criterion and that dynamic consistency holds. Meanwhile, at time t when considering the interval [t, T ], the resulting optimal strategy is given byπ t,T u = λ t,T σ t Xπ t,T u , t ≤ u ≤ T . Even when considering classical robust portfolio optimisation on [0, T ], this may be time-inconsistent since we may have λ t,T σ u = λ u,T σ u for t ≤ u ≤ T . In our context of forward criteria, when T is not fixed, the "optimal strategy" might further be horizon-inconsistent in the sense that we may haveπ t,T t =π t,T 1 t for t ≤ T < T 1 . Hence, the "optimal strategy" is not really a well-defined concept since it may depend not only on when we make the decision, but also on which horizon we consider. This is due to fundamental (time-)inconsistencies in the beliefs about feasible market models, manifested through a violation of (4.1).
Observe that in the above example, property (4.1) is violated in a rather simplistic way. Indeed, at any time t, looking to invest on [t, T ], the investor believes that only one model is feasible. This is a degenerate case since the choice of this model changes arbitrarily with t and T and there is no consistency requirement. Consider, for example, the extreme situation when all λ t,T are constant and T is fixed. Then at time zero, the investor picks possibly different models which she will choose to believe in when making investment decisions at t for the period [t, T ]; it is not surprising that this might lead to time-inconsistent investment strategies. However, the flexibility of fixing the penalty γ t,T implies that the dynamic consistency of the value functions, i.e., (2.14) on [0, T ] or (2.12) in general, may nevertheless be preserved.
In Example 4.3, the lack of time-consistency of optimal strategies is inherited from the lack of dynamic consistency of the penalty functions, i.e., from the violation of (4.1). In contrast, when the penalty functions are consistent, we recover the timeconsistency of the optimisers. (U, γ ) be a robust forward criterion such that Assumptions 3.2 and 4.1 hold. Moreover, assume that for each 0 ≤ t < T < ∞ and ξ ∈ L ∞ (F t ), there is a saddle point (π t,T (ξ ), Q t,T (ξ )) for which u(ξ ; t, T ) is attained (cf. (2.13)). Then the saddle point may be taken to be time-consistent in that Q t,

Proposition 4.4 Let
Furthermore, for x > 0, there exist a processπ t , t ≥ 0, and a positive martingale Y t , The above result, combined with Example 4.3, shows that dynamic consistency of the penalty functions, i.e., (4.1), is a necessary and sufficient condition for time consistency of the optimal investment strategies to hold for any corresponding criterion. This applies both to the robust forward criteria studied here as well as to classical robust expected utility maximisation on a fixed horizon. It leads to interesting open questions. First, the economic and empirical justification for (4.1) remains unclear. In fact, it is a non-trivial requirement, and, for example, penalty functions associated to convex risk measures do not satisfy (4.1) in general; see also Remark 3.5 in Schied [64]. Second, are there generalisations of the optimisation problem in (2.13) which would preserve time-consistency of optimal strategies while still violating (4.1)?
Next, we show that the dynamic consistency property of penalty functions leads to a characterisation of robust forward criteria in terms of a certain "weighted submartingale" property of the dual field. This is used in Sect. 5 to derive an equation allowing us to investigate particular classes and examples of robust forward criteria. for all Z ∈ Z a T and Q ∈ Q t,T ; moreover, there exist Z ∈ Z a and a positive martingale Y t , t ≥ s, such that for all s ≤ t ≤ T < ∞, (4

.3) holds with equality for
We conclude this section with brief remarks on the penalty functions γ t,T associated with (dynamic) convex risk measures (see [1,7,11,43]). Such penalty functions, under minimal regularity/continuity assumptions, satisfy the properties of Definition 2.5. However, the weak compactness condition in Assumption 3.2 usually requires stronger assumptions. Recall that for static risk measures, it is obtained for risk measures continuous from below, see [64,Lemma 4.1], and in particular by a coherent risk measure which only assigns zero penalty to equivalent measures; see [31] for an example. Regarding Assumption 4.1, the time-consistency of convex risk measures is characterised by property (4.1), and any time-consistent coherent risk measure 6

The structure of robust forward criteria and representative cases
In this section, we study the structure of robust forward criteria and subsequently discuss specific cases. Throughout, we consider the Brownian setup of Sect. 2.1, and the discussion is mostly formal. We start with the structure of forward criteria and focus on the non-uniqueness of robust forward criteria for given initial preferences. Then we study examples of classes where the uniqueness may be recovered. These classes are obtained by generalising, in various ways, the main example studied in Sect. 2.1. First, in Sect. 5.2, we consider fields which exhibit logarithmic dependence on wealth. Then, in Sect. 5.3, we focus on robust forward criteria with no volatility (cf. (5.6) below). Such criteria are characterised by a specific evolutionary property and linked to a certain PDE (Eq. (5.7) below). For both examples, the discussion is in terms of dual fields. Finally, in Sect. 5.4, we show that for each robust forward criterion, there exists a (standard) forward criterion in the fixed reference market producing the same optimal behaviour.

The structure and non-uniqueness of robust forward criteria
In the standard model-specific setting, the forward performance criteria (see [53,54]) are not uniquely specified from the initial condition. This is due to the flexibility of the investor to choose the volatility of her criterion. Indeed, a (standard) forward performance criterion (admitting an Itô decomposition) satisfies the SPDE dU (x, t) = 1 2 where a(x, t) is a parameter-dependent process (see below). At a formal level, this is an immediate consequence of an application of the Itô-Ventzell formula; see [54]. Similarly, the value process in the classical EUM problem satisfies (under appropriate regularity assumptions) the SPDE (5.1) on the interval [0, T ). However, the equation is then equipped with a terminal condition U(x, T ) = U(x) and constitutes a backward SPDE; see e.g. [48]. For a given terminal condition U(x), when recovering the value process from this backward SPDE, the (unique) solution consists of the pair (U (x, t), a(x, t)) which are both simultaneously obtained. Due to the volatility component a(x, t), there might, however, exist multiple stochastic terminal conditions for all of which U( · , 0) coincide. Put differently, for a given initial condition u 0 (x), the forward SPDE (5.1) might have multiple solutions which are catalogued by their volatility a(x, t). In the forward approach, it is then down to the investor herself to specify this volatility. In total analogy, within the robust setting and for a fixed penalty function, in order to specify robust forward criteria uniquely, we expect the need for further conditions. These could be either on the form of the primal/dual field or on the choice of volatility structure. We discuss both below. From the financial perspective, compared with classical utility maximisation, the forward formulation considers different inputs to the investment problem, for the standard as well as the robust case. In the classical setup, the investor's preferences are fully characterised via the spatial behaviour of the utility function at a future date, and the rest is derived. In the forward setting, the fixed inputs are the initial condition u 0 (x) and the requirement of dynamic consistency. In order to pin down a unique criterion, the investor then needs to specify additional evolutionary properties of the utility field.

A class of logarithmic robust forward criteria
We start by preserving the logarithmic dependence on wealth seen in the main motivating example in Sect. 2.1. For this, we need to consider nonnegative wealth, and since our main results were obtained for utility fields defined on the whole real line, the discussion is formal. A direct computation shows that up to a constant shift, the dual field corresponding to U given in (2.5) is V (y, t) = − ln y + for some processes (b t ) and (a t ) which do not depend on y. Further, we assign to the measure Q η (cf. (2.2)) the penalty 7 is proper, convex and lower semicontinuous, and satisfies the coercivity condition g t (η) ≥ −a + b|η| 2 for some constants a and b (cf. (8.6) in [27]). For example, taking g t (η) = |η| 2 for |η| ≤ g, g > 0, and g t (η) = ∞ otherwise ensures that (γ t,T ) satisfies 8 both Assumptions 3.2 and 4.1; a different quadratic penalty was considered in (2.3). We let Q = T >0 Q 0,T . We assume that (λ t ) is in P and let Z ν In particular, the assumption of NFLVR on finite horizons implies that Z ν ∈ Z e for ν t ≡ 0. Following Proposition 4.5, in order for the pair (V , γ ) to satisfy (3.2), we expect that for any Z ν ∈ Z e and Q η ∈ Q, the process is a Q η -submartingale, and that there are ν * and η * for which it is a martingale. We recall that Q η is specified via dQ η d P |F t = D η t , with D η t given in (2.2). A straightforward application of the Itô-Ventzell formula and formal minimisation over ν t yields that in order for (M ην t ) to satisfy this condition, the processes (a t ) and (b t ) must satisfy the relation We see that for a given initial condition and a fixed penalty g t ( · ), a specification of the volatility process (a t ) typically leads to a unique robust forward criterion, for the drift is then specified via (5.5). In particular, for the choice of a t ≡ 0 and g t (η) = δ t |η| 2 /2, we recover b t = − 1 2 δ t 1+δ tλ 2 t as expected. Another approach to pin down a unique U might be to consider fields which are Markovian. For example, within a (Markovian) stochastic factor model, one could require that U is represented as a deterministic function of the underlying factors. This function must then solve a specific equation, closely related to the HJB equation associated with the classical value function within the same factor model. However, in the forward setting, the equation has to be solved forward in time and is therefore ill-posed. We refer to [55] for a study of such criteria in a model-specific setup.

A class of robust forward criteria with zero volatility
In the previous section, we extended the example of Sect. 2.1 by adding a volatility term -the stochastic integral in (5.2) -to the representation of the primal (or dual) field. Here, we generalise it in a different direction: we keep zero volatility, but drop the specific (logarithmic) dependence on wealth. Specifically, considering utility fields defined on R, we are interested in all criteria for which the volatility of the dual field is identically zero, i.e., We refer to this class as non-volatile or time-monotone criteria. For standard forward criteria, this additional assumption specifies an interesting class of preferences; we refer the reader to [6,53] for further details. Similarly as for the example given in Sect. 5.2, a straightforward application of the Itô-Ventzell formula and formal minimisation over ν t yields that in order for (M ην t ) (cf. (5.4)) to be a submartingale for each choice of ν and η, and a martingale at the optimum, the random convex function V (y, t) must solve the equation This is a random PDE, as opposed to the SPDE we obtained before. Note that (5.7) implies that non-volatile criteria are in fact monotone in time, which justifies the terminology. We studied an instance of this equation in Sect. 2.1 when the criterion was both logarithmic and non-volatile; the appropriate form of the criterion (2.5) could formally be obtained by substituting the dual ansatz V (y, t) = − ln y + t 0 b s ds into either of Eqs. (5.5) or (5.7). Equation (5.7) might be viewed as a (dual) Hamilton-Jacobi-Bellman equation. In particular, a verification theorem stating that every well-behaved (convex) solution to (5.7) constitutes a robust forward criterion might be proved. However, proving existence or explicitly solving this equation is hard. In order to illustrate this, consider the case of no model uncertainty, which corresponds to g t (η) = ∞ for η = 0. Then Eq. (5.7) reduces to the random equation This equation characterises standard non-volatile criteria in a model with market price of risk (λ t ). Equation (5.8), see [6,53], is closely related to the (ill-posed) backward heat equation whose solutions only exist for a specific class of initial conditions, as characterised by Widder's theorem. We easily see that Eq. (5.7) inherits difficulties related to the equation being ill-posed, but in addition it is fully nonlinear. Moreover, we also need to ensure that its solution is adapted.

Equivalent standard (non-robust) forward criteria
We conclude with some remarks on the existence of equivalent forward criteria within a non-robust setting. First, returning to the example in Sect. 2.1, we observe that the optimal strategyπ in (2.4) can be interpreted as the Kelly strategy in an auxiliary market whereλ :=λ +η 1 = δ 1+δλ , i.e., where the market price of risk the investor considers most likely is adjusted by her trust in the estimation. This is an instance of a general phenomenon. Indeed, if a robust forward criterion (U, γ ), with penalty function given by (5.3), admits a (consistent) saddle point for all t ≤ T < ∞, say (π,η), then this robust criterion produces the same investment strategy as does the standard forward criterionŨ specified in a fictitious market with market price of riskλ t =λ t +η 1 t for t ≥ 0. In turn, an application of Bayes' rule implies that the optimal strategy associated with this criterion is also optimal for a forward criterion specified in the reference market, namely

Note that if U(x, t) is a non-volatile criterion, then Dη tŨ (x, t) is in general volatile (cf. Theorem 4 in [52] for examples).
For the class of robust forward criteria for which the above formalism can be made rigorous, the following holds: If the robust forward criterion admits an optimal strategy, then that strategy is optimal also for a specific standard (non-robust) forward criterion viewed in the reference market. Naturally, the latter criterion is defined in terms of the optimal (η t ), which is part of the solution to the robust problem and not a priori known. Nevertheless, on a more abstract level, this implies that viewed as a class of preference criteria, forward criteria can be argued to be "closed" under the introduction of a certain type of model uncertainty. For a similar conclusion in terms of the use of different numeraires, see [21,Theorem 2.5] or [20,Sect. 5.1]. An analogous result was proved for stochastic differential utilities in [67]. In both cases, the results rely on the notions being general enough to allow stochastic preferences. The advantage of properly formulating robust forward criteria is the resulting ability to disentangle the impact on the preferences originating from risk and model ambiguity; see Sect. 2.1. In consequence, the inverse question to the above observations appears to be of great interest: Under what conditions can a given (volatile non-robust) forward criterion be written as a non-volatile robust forward criterion with respect to some non-trivial penalty function?
Finally, we remark that our analysis here, and thus the above discussion, is restricted to measures equivalent to P. Considering absolutely continuous measures introduces further complexity (cf. [64] for the static case), but should not alter the main conclusions; see also the remarks in Sect. 3.1. In contrast, considering a larger set of possibly mutually singular measures would require new insights; see [16,57].

Proofs of Theorem 3.3 and Proposition 3.4
Throughout Sect. 6.1, we consider a pair (U, γ ) of a utility random field and an admissible family of penalty functions and the associated dual field V given in (3.1). Further, we consider the arbitrary but fixed time points 0 ≤ t ≤ T < ∞. We start by introducing relevant notation from Zitković [69] since we then apply the duality from there in our proofs; see (6.6) below. Then, in Sect. 6.1.1, we prove conjugacy relations and existence of a dual optimiser for a specific auxiliary problem. In Sect. 6.1.2, Theorem 3.3 and Proposition 3.4 are proved via a reduction to this auxiliary problem.
The spaces L p , p ∈ [0, ∞], are defined with respect to ( , F T , P| F T ); the space L 1 is identified with its image in (L ∞ ) * under the isometric embedding of a Banach space into its bidual.
Let K t,T := { T t π s dS s : π ∈ A bd } and C t,T := (K t,T − L 0 + ) ∩ L ∞ . The optimisation over K t,T in (2.13) can then be replaced by optimisation over C t,T . Given Q ∈ Q t,T and a random variable κ ∈ L ∞ + (F t ) -we typically consider κ = 1 A , A ∈ F t , and use it to localise arguments to a set -, we then introduce the function Next, let D t,T := {ζ * ∈ (L ∞ ) * : ζ * , ζ ≤ 0 for all ζ ∈ C t,T }, and for η ∈ L 1 + (F t ), let D for some η ∈ L 1 + (F t ) and Z ∈ Z a T . Note that the proof of this result uses that the market satisfies NFLVR on finite horizons. Define the function V Q κ : and the function v Q κ : Finally, we introduce the auxiliary value functions u κ : L ∞ (F t ) → (−∞, ∞] and v κ : L 1 (F t ) → (−∞, ∞] given, respectively, by

Results for the auxiliary value functions u κ and v κ
We establish in this section results for the auxiliary value functions u κ and v κ introduced above. First, we consider the existence of a dual optimiser.
Next, note that D η t,T ⊆ (L ∞ ) * is included in a ball of size η, 1 with respect to the operator norm, and such balls are weak * compact according to the Banach-Alaoglu theorem. For any net (ζ * α ) α∈A in D η t,T , where A is some directed set, there thus exists a subnet, which we still label by (ζ * α ) α∈A , converging in the weak * topology to someζ * ∈ (L ∞ ) * . Since D t,T clearly is weak * closed,ζ * ∈ D t,T . Further, since for any ξ ∈ L ∞ (F t ), ζ * , ξ = lim α ζ * α , ξ = η, ξ , we have thatζ * ∈ D η t,T ; in consequence, D η t,T is weak * compact. Recall that {Z Q t,T : Q ∈ Q t,T } is weakly compact by assumption.
Fix ζ ∈ L ∞ and recall that the set {Z Q t,T U − (ζ, T ) : Q ∈ Q t,T } is uniformly integrable. The set {Z Q t,T : Q ∈ Q t,T and E[κZ Q t,T U(ζ, T )] ≤ c} is convex. Further, using the above uniform integrability and Fatou's lemma, it is closed in L 1 and hence by convexity also weakly closed. It follows that Z → E[κZU (ζ, T )] is weakly lower semicontinuous on the weakly compact set {Z Q t,T : Q ∈ Q t,T }. Next, ζ * → ζ * , ζ , ζ ∈ L ∞ , is trivially continuous with respect to the weak * topology. Since the pointwise supremum preserves lower semicontinuity, we thus obtain joint lower semicontinuity of the mapping (ζ * , Z Q t,T ) → V Q κ (ζ * ) with respect to the product topology on D η t,T × {Z Q t,T : Q ∈ Q t,T }. Combined with the assumed lower semicontinuity of the mapping Z → E[κγ t,T (Z)] (see Definition 2.5), this implies the existence of a minimiser (ζ * ,Z) for which v κ (η) is attained.
The convexity of v κ (η) follows immediately from the joint convexity of the map- In order to establish lower semicontinuity of v κ (η), we take a directed set A and a net (η α ) α∈A in L 1 + with η α → η weakly. By the above, we can pick Thanks to the weak compactness of the set of conditional densities, passing to a subnet, (ζ * α , Z * α ) converges in the product topology to some element (ζ * , Z) in the set , it follows that ζ * ∈ D η t,T . The lower semicontinuity of v κ (η) then follows from the joint lower semicontinuity of the mapping (ζ * , Z) In order to establish the conjugacy relations for u κ and v κ , we first recall a result from [69]. To this end, take κ ∈ L ∞ + (F t ) and Q ∈ Q t,T and consider the auxiliary stochastic utility functionŨ(x, T ) := Z Q t,T U(x, T ), x ∈ R, with convex conju-gateṼ (y, T ) = Z Q t,T V (y/Z Q t,T , T ), y ≥ 0. Suppose that κŨ (x, T ) ∈ L 1 , x ∈ R, and that the second part of Assumption 3.2 holds. Then we may apply Propositions A.1 and A.3 in [69] to obtain According to (6.1), for each ζ * ∈ D t,T ∩ L 1 + , there exists η ∈ L 1 + (F t ) such that ζ * ∈ D η t,T . Combined with the definitions of V Q κ and v Q κ , (6.5) hence implies We now establish the conjugacy relations between u κ and v κ . This result is the cornerstone in the proof below of the conditional versions in Theorem 3.3. As in previous works, see e.g. [60,64,65], we use a minimax theorem in order to reformulate the robust problem as the infimum over a class of non-robust criteria. We then apply duality to each of the inner maximisation problems. Unlike Schied [64], who used the EUM duality results of Kramkov and Schachermayer [45], we apply the relation (6.6) to suitably defined stochastic utility fields considered under the fixed reference measure. This is of technical as well as conceptual importance and makes key use of Assumption 3.2.

Proposition 6.2 Suppose that Assumption 3.2 holds and let
Then for all ξ ∈ L ∞ (F t ) and η ∈ L 1 + (F t ), it holds that Proof By exploiting properties (i) and (ii) of Definition 2.5 and the same arguments as used in the proof of Proposition 6.1 to establish lower semicontinuity of the function in (6.4), we obtain that for ξ ∈ L ∞ (F t ), is convex and weakly lower semicontinuous on the convex and weakly compact set T , is concave on the convex set C t,T . Hence the assumptions of [24,Thm. 2] are satisfied, and applying that result yields where the last equality follows directly from the definition of u Q κ . Next, note that due to concavity, if U(x 0 , T ) ∈ L 1 for some x 0 ∈ R, then U(x, T ) ∈ L 1 for all x ∈ R. Now, using the convention inf ∅ = ∞, without loss of generality, we may replace the set Q t,T in (6.7) by Q κ t,T := {Q ∈ Q t,T : κZ Q t,T U(x, T ) ∈ L 1 , x ∈ R}. In turn, by Assumption 3.2 and the discussion preceding this proof, for each Q ∈ Q κ t,T , the conjugacy relation (6.6) applies and we obtain indeed, to see the last equality, recall that v Q κ (η), η ∈ L 1 + (F t ), is given by (6.3) with V Q κ admitting the representation (6.4), which implies that Q κ t,T may be replaced by Q t,T in the second line above.
To establish that v κ is also the convex conjugate of u κ , it now suffices to argue that v κ is convex and weakly lower semicontinuous, which follows from Proposition 6.1.

Proof of Theorem 3.3 and Proposition 3.4
We are now ready to prove the main results of Sect. 3.1. Our setting is dynamic, which in this generality appears novel even in the context of the classical robust EUM; compare e.g. Schied [64]. We proceed by reducing the conditional formulations to the auxiliary problem studied in Sect. 6.1.1. This is done with the help of the following lemma which uses crucially that our penalty functions satisfy condition (2.11). For κ ∈ L ∞ + (F t ) and ξ ∈ L ∞ (F t ), we define Lemma 6.3 Suppose that Assumption 3.2 holds. Given κ ∈ L ∞ + (F t ), ξ ∈ L ∞ (F t ) and g ∈ C t,T , it then holds that Proof The inequality "≤" is trivial. To show "≥", define J (Q) := J κ,ξ (Q, g) for Q ∈ Q t,T . It suffices to argue that the set {J (Q) : Q ∈ Q t,T } is downward directed because by Neveu [56,Proposition VI.1.1], there is then a sequence (Q n ) ⊆ Q t,T such that (J (Q n )) decreases to ess inf Q∈Q t,T J (Q). The result then follows by using monotone convergence. To argue the directedness, let Q 1 , Q 2 ∈ Q t,T , define the set A := {J (Q 1 ) ≤ J (Q 2 )} ∈ F t and let the measureQ be given by Using property (2.11), we have γ t,T (Q) = 1 A γ t,T (Q 1 ) + 1 A c γ t,T (Q 2 ). SoQ ∈ Q t,T and J (Q) = min{J (Q 1 ), J (Q 2 )} a.s. In consequence, the set {J (Q) : Q ∈ Q t,T } is closed under minimisation and thus downward directed.
First, we establish the existence of a dual optimiser.
We now turn to Theorem 3.3. The proof proceeds by assuming that the conditional conjugacy relations do not hold; taking expectations and applying Proposition 6.2 and Lemma 6.3, we then obtain a contradiction which allows us to conclude.
Proof of Theorem 3.3 First, we consider assertion (3.4). In order to verify that the (weak) inequality "≤" holds, note that we trivially have the inequality u(ξ ; t, T ) ≤ ess inf Since E Q [g] ≤ 0 for all Q ∈ M a T , g ∈ C t,T and U(x, T ) ≤ V (y, T ) + xy a.s. for all x ∈ R, y ≥ 0, it follows immediately from (6.9) that for all η ∈ L 0

T , T )|F t ] + ξη + γ t,T (Q)
= v(η; t, T ) + ξη. (6.10) Next, we argue that the inequality "≥" holds in (3.4) with the infimum on the righthand side taken over L 1 , this trivially yields the claim. So assume to the contrary that there exist ξ ∈ L ∞ (F t ), ε > 0 and A ∈ F t with P[A] > 0 such that

T , T )|F t ] + γ t,T (Q) + ξη
for all g ∈ K t,T , Z ∈ Z a T , Q ∈ Q t,T and η ∈ L 1 + (F t ). Observe that u(ξ ; t, T ) < ∞ a.s. on A and without loss of generality, we may assume that there is M < ∞ such that u(ξ ; t, T ) ≤ M a.s. on A. Multiplying the latter inequality by κ = 1 A , taking expectations on both sides and applying Lemma 6.3, we then obtain for any η ∈ L 1 + and Z ∈ Z a T such that {ηZ t,T > 0} ⊆ A. According to (6.1), we have that for every ζ * ∈ D η t,T ∩ L 1 + with η ∈ L 1 + (F t ), there exists Z ∈ Z a T such that ζ * = ηZ t,T . Using this and taking the supremum over g ∈ K t,T , we deduce that Therefore, for any η ∈ L 1 + (F t ) and Q ∈ Q t,T , the above inequality holds for all ζ * ∈ D η t,T . Indeed, if ζ * / ∈ L 1 + or {ζ * > 0} A, then it holds that V Q κ (ζ * ) = ∞ (cf. (6.2)). Hence, for all η ∈ L 1 + (F t ) and Q ∈ Q t,T . In turn, since u κ (ξ ) ≤ M < ∞ due to the above choice of κ, we obtain which according to Proposition 6.2 yields the required contradiction.
Next, we turn to relation (3.5). Note that assertion (3.4) implies that for any η ∈ L 0 + (F t ) and ξ ∈ L ∞ (F t ), we have v(η; t, T ) ≥ u(ξ ; t, T ) − ξη. Hence the inequality "≥" follows directly. For η ∈ L 1 + (F t ), the reverse inequality follows by similar arguments as above, specifically, by arguing by contradiction and applying Lemma 6.3 and Proposition 6.2. In turn, for η ∈ L 0 + (F t ) and A ∈ F t , for any Q ∈ Q t,T and Z ∈ Z a T ; it follows from the definition of v( · ; t, T ) that 1 A v(η; t, T ) = 1 A v(1 A η; t, T ) a.s. For an arbitrary η ∈ L 0 + (F t ), we may then define A n := {η ≤ n} and η n := η1 A n ∈ L 1 + (F t ), n ∈ N. By using the identity 1 A n v(η; t, T ) = 1 A n v(η n ; t, T ) and applying (3.5) to η n , we then obtain that (3.5) holds for η on A n for any n ∈ N. Since η takes finite values a.s., we thus obtain that (3.5) holds a.s.

Proof of Propositions 4.2, 4.4 and 4.5
In order to prove the results in Sect. 4, we first establish two lemmas. Throughout this section, we write γ 0,t (Q) := γ 0,t (Q| F t ) for Q ∈ Q 0,T . (U, γ ) be a pair of a utility random field and an admissible family of penalty functions with associated dual field V . Given T > 0, let v(x; t, T ) be the corresponding dual value field. Suppose that the infimum in (3.2) is attained for any t ≤ T and η ∈ L 1 + (F t ), and that either Assumption 4.1 holds or (4.1) holds and v − (ζ ; t, T ) ∈ L 1 (F t ; Q) for ζ ∈ L 0 (F t ) and Q ∈Q 0,T , t ≤ T . Then the pair (v, γ ) is dynamically consistent on the interval [0, T ].

Lemma 6.4 Let
Proof Let 0 ≤ s < t < T < ∞ and take η ∈ L 1 + (F s ), Z ∈ Z a t and Q ∈ Q s,t . By using similar arguments as in the proof of Lemma 6.3, we obtain that the optimisation set in (3.3) is downward directed, and so there exists a sequence (Z n , Q n ) ⊆ Z a T × Q t,T such that the objective function evaluated at (Z n , Q n ), n ∈ N, decreases to v(ηZ s,t /Z Q s,t ; t, T ). By using monotone convergence, we then obtain where we used that E[Z t Z n t,T |F u ], u ≤ T , belongs to Z a T and thatQ ∈ Q s,T for dQ dP| F T = Z Q t Z Q n t,T . Indeed, (4.2) yields immediately thatQ ∈ Q s,T . For the case when (4.1) holds and v − (ζ ; s, T ) ∈ L 1 (F T ;Q) for ζ ∈ L 0 (F T ), the fact that v(η; s, t) is finite implies without loss of generality that E Q [γ t,T (Q n )|F s ] < ∞, and thus Q ∈ Q s,T .
Next, let Z ∈ Z a T and Q ∈ Q s,T be optimal objects for which the infimum in v(η; s, T ) is attained. From (4.1) we deduce that Q ∈ Q t,T and Q| F t ∈ Q s,t . It follows that v(η; s, ≥ v(η; s, T ), (6.12) where the last inequality is due to (6.11). Hence equality must hold throughout. Finally, the fact that property (3.2) must hold also for η ∈ L 0 + (F s ) follows by the same arguments as used at the end of the proof of Theorem 3.3. (U, γ ) be a pair of a utility random field and an admissible family of penalty functions satisfying (4.1). Let V be the associated dual field, and suppose that the infimum in (3.2) is attained for t ≤ T < ∞ and η ∈ L 1 + (F t ). Then the following two statements are equivalent:

Lemma 6.5 Let
for all Q ∈ Q t,T and Z ∈ Z a T ; moreover, for anyT > s, there areQ ∈ Q s,T and Z ∈ Z ā T such that (6.13) holds with equality for all s ≤ t ≤ T ≤T . Furthermore, if either (a) Q 0,T =Q 0,T , T > 0, or 9 (b) for any T > 0 and ζ ∈ L 0 (F T ), we have V − (ζ, T ) ∈ L 1 (F T ; Q) for all Q ∈Q 0,T , then (i) and (ii) are equivalent to the following condition: (iii) For any s > 0 and η ∈ L 1 + (F s ), for all s ≤ t ≤ T < ∞, (6.13) holds for all Q ∈ Q t,T and Z ∈ Z a T ; moreover, there exist a process Z ∈ Z a and a sequence of Proof 10 In order to argue that (ii) implies (i), note that an application of (6.13) with s ≡ t immediately yields that the pair (V , γ ) satisfies (3.2) for all t ≤ T < ∞ and η ∈ L 1 + (F t ); the extension to η ∈ L 0 + (F t ) then follows by the same arguments as in the proof of Theorem 3.3.
To show that (i) implies (ii), let s > 0 and η ∈ L 1 + (F s ). For s ≤ t ≤ T < ∞, Z ∈ Z a T and Q ∈ Q t,T , applying (3.2) with η replaced by ηZ s,t /Z Q s,t then yields which implies the inequality (6.13). Next, letT > s and letZ ∈ Z ā T andQ ∈ Q s,T be the optimal objects for which v(η; s,T ) is attained. SinceQ ∈ Q s,T , we have thatQ| F T ∈ Q s,T andQ ∈ Q T ,T . 9 Condition (b) holds e.g. if U(x, T ) ∈ L 1 (F T , Q) for all Q ∈Q 0,T . 10 In a previously circulated preprint version of this paper, (6.13) was stated only with s = 0 and η = y ∈ R + . Using regular conditional expectations and convexity, (6.13) then extends to s = 0 and η ∈ L 0 + (F t ) and the inequality "≤" in (3.2) then follows. However, our arguments to deduce equality in (3.2) were erroneous. Ignoring for simplicity the question of model uncertainty, knowing for y > 0 that there exists some Z y for which (V (yZ y t , t)) is a martingale allows one to deduce that (3.2) holds with equality for any η of the form η = yZ y t , y > 0; but it is not clear why in all generality, this should then extend to η ∈ L 1 + (F t ). Similar comments apply also to the statement and proof of Theorem 3.14 in [69].
In turn, using that (V , γ ) is self-generating (cf. (3.2)) and performing a calculation similar to the one in (6.12) (which then holds with equalities throughout), we obtain that v(η; s, T ) is attained forZ s,T andQ| F T , when T ≤T . We now claim that for s ≤ t ≤ T ≤T , (6.13) holds as equality forZ andQ. Indeed, suppose contrary to the claim that there exist ε > 0 and A ∈ F t with P[A] > 0 such that V (ηZ s,t /ZQ s,t , t) + ε1 A ≤ EQ[V (ηZ s,T /ZQ s,T , T )|F t ] + γ t,T (Q).
Taking expectations underQ and using (4.1) combined with the fact that v(η; s, t) and v(η; s, T ) are attained by (Z s,t ,Q| F t ) and (Z s,T ,Q| F T ), we then obtain a contradiction to the identity v(η; s, t) = v(η; s, T ) a.s. Assertion (iii) trivially implies (ii). Hence, it only remains to show that (i) implies (iii). To this end, let s < T 1 < T 2 and let (Z 1 , Q 1 ) ∈ Z a T 1 × Q s,T 1 be an argument for which v(η; s, T 1 ) is attained. In turn, let (Z * , Q * ) ∈ Z a T 2 × Q s,T 2 be an argument for which v(η; s, T 2 ) is attained and define Z 2 and Q 2 by We next show that also (Z 2 , Q 2 ) attains the infimum in v(η; s, T 2 ). To this end, recall first from the proof of "(i) ⇒ (ii)" that v(η; s, T 1 ) is attained for (Z * s,T 1 , Q * | F T 1 ). Further, note that due to the strict convexity of V ( · , t, ω), (t, ω) ∈ [0, ∞) × , we have for any z 0 , z 1 , y 0 , y 1 ∈ (0, ∞) that z 0 + z 1 2 V 1 2 (y 0 + y 1 ) 1 2 (z 0 + z 1 ) , T 1 , ω ≤ 1 2 z 0 V y 0 z 0 , T 1 , ω + 1 2 z 1 V y 1 z 1 , T 1 , ω , and the inequality is strict whenever y 0 z 0 = y 1 z 1 ; see [65,Eq. (21)]. In consequence, we must have Z 1 s,T 1 /Z Q 1 s,T 1 = Z * s,T 1 /Z Q * s,T 1 a.s. Second, using that (V , γ ) is self-generating (cf. (3.2)) and the fact that Q * ∼ P, performing a similar calculation as in (6.12) (which then holds with equalities throughout), we obtain that v(ηZ * s,T 1 /Z Q * s,T 1 ; T 1 , T 2 ) is attained for (Z * , Q * ). Combining the above two facts and using once again that (V , γ ) is self-generating, we obtain , T 1 F s + γ s,T 1 (Q 1 ) ≥ v(η; s, T 1 ) = v(η; s, T 2 ), and thus v(η; s, T 2 ) is attained for (Z 2 , Q 2 ); the fact that Q 2 ∈ Q 0,T 2 is immediate under the full Assumption 4.1, and follows by similar arguments as in Lemma 6.4 under the assumption (b). We note that (Z 2 , Q 2 ) was constructed so that Z 1 T 1 = Z 2 T 1 and Q 1 = Q 2 | F T 1 , and that for any s < T < T 2 , v(η; s, T ) is attained for Z 2 s,T and Q 2 | F T . For any sequence s < T 1 < T 2 < · · · , a repetition of the above pasting procedure yields a process Z ∈ Z a and a sequence of measures Q i ∈ Q s,T i , i ∈ N, with Q i = Q i+1 | F T i , such that for all T > s, v(η; s, T ) is attained for (Z s,T , Q T ) with Q T := Q i | F T ∈ Q s,T for T ≤ T i . In turn, applying again the same arguments as used to show that (i) implies (ii), we obtain that for any s ≤ t ≤ T < ∞, (6.13) holds with equality for (Z, Q T ). Hence (iii) holds and we conclude.
We now argue that the results in Sect. 4 follow from the above lemmas. First, while Theorem 3.3, Proposition 3.4 and Lemma 6.4 readily yield Proposition 4.2, Proposition 4.5 follows from combining Theorem 3.3 and Proposition 3.4 with Lemma 6.5.
Next, we establish Proposition 4.4. To this end, without loss of generality, let t = 0 and x ∈ R. Recall that u( · ; 0, T ) and v( · ; 0, T ) satisfy the conjugacy relations (see Theorem 3.3) and let y * > 0 be the value for which the infimum in (3.4) is attained; y * is independent of T since u(x; 0, T ) = U(x, 0), T ≥ 0. By the same arguments as in the proof of Lemma 6.5 (cf. "(i) ⇒ (iii)"), it follows that there exist Z ∈ Z a and a positive martingale Y t , t ≥ 0, such that for T ≥ 0, Q T ∈ Q 0,T with dQ T The latter implies thatX T = x − T 0 dF t with F t := V (y * Z t /Y t , t). In consequence, π 0,T 0 =π 0,T 0 , 0 ≤ T ≤T . To argue that π t,T u (ξ ) = π u,T u (ξ + u t π t,T s dS s ), t ≤ u ≤ T , assume contrary to the claim that there exist ε > 0 and A ∈ F u with P [A] > 0 such that (6.14) Taking expectations under Q u , using that (U, γ ) satisfies (2.12) and that (4.1) holds then yields which gives the contradiction u(x; 0, T ) < u(x; 0, u). Similarly, assuming the reverse strict inequality in (6.14) also gives a contradiction and we conclude.

Proof of Propositions 2.1 and 2.2
Proof of Proposition 2.1 Let 0 ≤ t ≤ T < ∞ be fixed. Throughout the proof, we writeŴ s =Ŵ 1 s . To alleviate the notation, let L s = s 0λ u dŴ u and M s = s 0λ u 1+δ u dŴ u . Recall thatÊ[e κ L T ] < ∞, κ > 1/2. Take p,p > 1 such that p 2p2 ≤ 2κ and, with 1 p + 1 q = 1 = 1 p + 1 q , such thatq( p 2p 2 − p 2 ) = pp(pp−1) 2(p−1) ≤ κ. We then have which is finite. More precisely, out of the three factors, the first is equal to one and the other two are finite, as is easily seen using Novikov's condition, the fact that M T ≤ L T and the assumed integrability of L T . It follows that γ t,T (Qη) < ∞ and hence Qη ∈ Q t,T . Next, let N π,η u := U(X π u , u) + for all Q η ∈ Q t,T . For simplicity, and without loss of generality, we establish the claim for t = 0. For π ∈ A x 0 , the wealth process then satisfies dX π s = π s σ s S s (λ s + η 1 s )ds + dW η s , s≤ T , X π 0 = x, (6.15) where W η is a Brownian motion under Q η . Due to the form of U andπ , a straightforward application of Itô's lemma yields which is finite since the first of the two factors is equal to one and the second one is finite, as can be seen by applying Novikov's condition, the fact that M η T ≤ L T and the assumed integrability of L T .
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.