1 Introduction

There has been extensive research in both stochastic control and convex optimization, see, for example, books [9, 16, 24] for stochastic control and [2, 8, 19] for convex optimization for excellent exposition in theory, computation, and application. Linear convex (LC) stochastic control has the state process satisfying a controlled linear stochastic differential equation (SDE) and the objective function being convex in state and control variables. Due to the nature of convexity, any optimal solution is a global solution. LC stochastic control covers many applications, for example, aggregate production and work-force planning [13], stochastic inventory control [3], consumption–investment problem [5], reinforcement learning [10], etc.

If the objective function is a quadratic function and the control set is the whole space, then the optimal control is an affine function of the state variable and its form can be determined by the solution of some fully coupled linear forward backward SDE (FBSDE) and stochastic Riccati equation (SRE), see [21, 24]. There are many extensions with additional constraints and other conditions, for example, [11] introduces the extended SRE and provides the explicit characterization of the optimal control of a stochastic linear quadratic (LQ) control problem with random coefficients and cone control constraints and scalar state variable. [12] derives the stochastic maximum principle (SMP) for LQ problem with nonconvex control domain.

There are many references in the literature on solving LC problems. For example, [3] identifies some specific LC problems whose solution can be obtained by solving appropriate equivalent deterministic optimal control. [4] uses conjugate functions for LC problem. [5] derives the SMP for LC problem with multidimensional state process and control constraints. [7] studies a discrete-time LC problem with scalar control and describes explicit solutions for suitable Bellman equations.

The standard methods for stochastic control can be used to characterize the optimal control and state processes for LC problems, in the form of the Hamilton–Jacobi–Bellman (HJB) equation for models with deterministic coefficients or the FBSDE and the maximum condition, but it is in general difficult to solve these equations in the presence of control constraints and non-quadratic objective functions. The HJB equation is a fully nonlinear multidimensional partial differential equation (PDE) and the FBSDE is a fully coupled nonlinear FBSDE. Since LC problems are convex, one may use the ideas and methodologies developed for convex optimization to solve them. One approach is to convert the dynamic model into a static convex optimization in some abstract space, then derive the dual problem and establish the relations of primal and dual optimal solutions, and finally convert the results back to the dynamic model. The main advantage of solving a static convex optimization problem is that all known results for conjugate duality in [19] can be applied, but it is highly difficult to solve an infinite-dimensional constrained convex optimization problem, see [14] for details.

The model setting of this paper is largely the same as that of [5] without the condition that the running objective function is continuously differentiable in control variables, whereas [5] includes an additional state constraint. The SMP in [5] still holds and can be proven essentially in the same way. [14, 20] are close to our paper in the sense of dual control formulation and relation of primal and dual optimal solutions. [14] discusses the quadratic risk minimization of a controlled wealth process (a scalar stochastic process in mathematical finance), formulates the dual problem, and proves the existence of dual solution for a mean-variance problem. [20] studies a deterministic LC control problem without control constraints in the framework of duality for calculus of variations problems and proves some regularity properties of value function and optimal control.

The main contribution of this paper is to solve the LC stochastic control problem via the convex duality theory and derive the relation of the primal and dual optimal solutions, which has not been discussed in [5, 14, 20] nor anywhere else in the literature, to the best knowledge of the authors. Instead of converting the LC problem into an abstract convex optimization as in [14], we use the supermartingale approach as in [15] that gives the necessary and sufficient optimality conditions for a scalar LQ problem with control constraints. One complication is that the dual running objective function may be nondifferentiable and the resulting backward SDE (BSDE) for the dual adjoint process is not well defined in the usual BSDE sense. We need to transform the dual problem with some new dual control variable to resolve the issue.

The usefulness of the dual formulation is highlighted with some examples, including ones with nonsmooth running cost and bounded/unbounded control constraint set and random coefficients, where the primal problem is difficult to solve using the standard well-known methods such as maximizing the Hamiltonian function and solving the FBSDE or finding the value function via the HJB equation. The presence of nondifferentiability or control constraint makes the standard methods difficult and ineffective. In contrast, the dual problem of these examples can be solved and the primal optimal solution can be constructed via the primal–dual relation.

The paper is organized as follows: Sect. 2 states the model, the SMP (Theorem 1), and the dual problem. Section 3 discusses the transformed dual problem, the dual SMP (Theorem 2), and the primal–dual relation (Theorems 3 and 4). Section 4 solves some examples. Section 5 concludes. Appendix gives the proof of Theorem 2.

2 Primal and Dual Problems

We assume a complete probability space \((\Omega ,\mathcal {F},\mathbb {F},P)\), where \(\mathbb {F}:=\{\mathcal {F}_t\}_{t\in [0,T]}\) is the P-augmentation of the natural filtration \(\{\mathcal {F}_t^W\}_{t\in [0,T]}\) generated by d-dimensional independent standard Brownian motions \(\{(W_1(t), \ldots ,W_d(t))\}_{t\in [0,T]}\). Denote by \(\mathbb {R}^{n\times m}\) the space of \(n\times m\) matrices, \(\mathbb {R}^{n}\) the space of n-dimensional vectors, \(M^{\top }\) the transpose of matrix M, \(\hbox {tr}(M)\) the trace of a square matrix M,\(|M|=\sqrt{\hbox {tr}(M^{\top }M)}\) the Frobenius norm of matrix M, \(\mathcal {P}(0,T;\mathbb {R}^{n})\) the set of \(\mathbb {R}^{n}\)-valued progressively measurable processes on \([0,T]\times \Omega \), \(\mathcal {H}(0,T;\mathbb {R}^{n})\) the set of processes x in \(\mathcal {P}(0,T;\mathbb {R}^{n})\) such that \(E[\int _0^T|x(t)|^2dt]<\infty \), and \(\mathcal {S}(0,T;\mathbb {R}^{n})\) the set of processes x in \(\mathcal {P}(0,T;\mathbb {R}^{n})\) such that \(E[\sup _{0\le t\le T}|x(t)|^2]<\infty \).

Define the set of admissible controls by

$$\begin{aligned} \mathcal {A}:=\left\{ u\in \mathcal {H}(0,T;\mathbb {R}^m): u(t)\in K\hbox { for} t\in [0,T], \hbox {a.e.} \right\} , \end{aligned}$$

where \(K\subseteq \mathbb {R}^m\) is a nonempty closed convex set.

Given any \(u\in \mathcal {A}\), consider the state process X satisfying the following SDE:

$$\begin{aligned} dX(t)&=\left[ A(t)X(t)+B(t)u(t)\right] dt+\sum _{i=1}^d\left[ C_i(t)X(t)+D_i(t)u(t)\right] dW_i(t),\nonumber \\ X(0)&=x_0\in \mathbb {R}^n, \end{aligned}$$
(1)

where processes \(A,C_i:\Omega \times [0,T]\rightarrow \mathbb {R}^{n\times n}\) and \(B,D_i:\Omega \times [0,T]\rightarrow \mathbb {R}^{n\times m}\), \(i=1,\ldots ,d\), are \(\mathbb {F}\)-progressively measurable and uniformly bounded. The pair (Xu) is admissible if X is a solution to SDE (1) with control \(u\in \mathcal {A}\).

Consider the functional \(J:\mathcal {A}\rightarrow \mathbb {R}\), defined by

$$\begin{aligned} J(u):=E\left[ \int _0^T f(t,X(t),u(t))dt+g(X(T)) \right] , \end{aligned}$$
(2)

where \(f:\Omega \times [0,T]\times \mathbb {R}^n\times \mathbb {R}^m\rightarrow \mathbb {R}\) and \(g:\Omega \times \mathbb {R}^n\rightarrow \mathbb {R}\) are measurable functions, f is \(\mathbb {F}\)-progressively measurable for fixed (xu), convex in (xu), \(C^1\) in x, continuous in u, and g is \(\mathcal {F}_T\)-measurable for fixed x, convex and \(C^1\) in x. fg are sufficiently general to cover many common objective functions such as quadratic functions, discounted cost functions with \(f(t,x,u)=e^{-rt}\tilde{f}(x,u)\), etc. We denote by \( f_x(t,x,u)\) the partial derivative of f with respect to x and use similar notations for other derivatives.

The optimization problem is the following:

$$\begin{aligned} \hbox {Minimize} J(u) \hbox {subject to }(X,u) \hbox { admissible}. \end{aligned}$$
(3)

An admissible pair \((\hat{X},\hat{u})\) is optimal if \(J(\hat{u})\le J(u)\) for all \(u\in \mathcal {A}\). To shorten notations, we will omit time variable t in expressions in the rest of the paper if no confusion may be caused, for example, write A instead of A(t), \(\int _0^T f(t,X,u)dt\) instead of \(\int _0^T f(t, X(t), u(t))dt\).

Problem (3) is studied in [5] which proves the SMP and applies the results to the consumption–investment problem and the square-integrable controls.

We need the following assumption:

Assumption 1

Let \((\hat{X}, \hat{u})\) be an admissible pair satisfying \( E[\int _0^T | f_x(t,\hat{X}, \hat{u})|^2dt]<\infty \) and \( E[| g_x(\hat{X}(T))|^2]<\infty \). There exist \(Z\in \mathcal {P}(0,T;\mathbb {R})\) and an \(\mathcal {F}_T\)-measurable random variable \(\tilde{Z}\) satisfying \(E[\int _0^T|Z(t)|dt]<\infty \), \(E[|\tilde{Z}|]<\infty \) such that for any admissible pair (Xu) and \(\epsilon \in (0,1]\),

$$\begin{aligned} Z(t)&\ge \frac{f(t, \hat{X}+ \epsilon X , \hat{u} + \epsilon u) - f(t, \hat{X}, \hat{u}) }{\epsilon },\\ \tilde{Z}&\ge \frac{g(\hat{X}(T) + \epsilon X(T) )-g(\hat{X}(T) )}{\epsilon } \end{aligned}$$

for \((P\otimes \text {Leb})\)-a.e. \((\omega ,t)\in \Omega \times [0,T]\).

A sufficient condition for Assumption 1 to hold is that fg are \(C^1\) in xu and their derivatives have linear growth, that is, \(|f_x(t,x,u)|+|f_u(t,x,u)|\le C(1+|x|+|u|)\) and \(| g_x(x)|\le C(1+|x|)\) for all txu and some constant C, which covers quadratic functions.

The Hamiltonian \(H:\Omega \times [0,T]\times \mathbb {R}^n\times \mathbb {R}^m\times \mathbb {R}^n\times \mathbb {R}^{nd}\rightarrow \mathbb {R}\) is defined by

$$\begin{aligned} H(t,x,u,p_1,q_1):=x^{\top }A^{\top }p_1+u^{\top }B^{\top }{p}_1+\sum _{i=1}^dx^{\top }C_i^{\top }{q}_{1,i}+\sum _{i=1}^du^{\top }D_i^{\top }{q}_{1,i}-f(t,x,u), \end{aligned}$$
(4)

where \(p_1\in \mathbb {R}^n\) and \(q_1:=(q_{1,1}, \ldots , q_{1, d})\) and \(q_{1,i}\in \mathbb {R}^n\).

The next theorem states the SMP for problem (3), see [5, Theorem 1.5].

Theorem 1

Let \(\hat{u}\in \mathcal {A}\) and Assumption 1 hold. Then, \(\hat{u}\) is optimal for problem (3) if and only if the solution \((\hat{X},\hat{p}_1,\hat{q}_1)\) of the FBSDE

$$\begin{aligned} d\hat{X}&=[A\hat{X}+B\hat{u}]dt+\sum _{i=1}^d[C_i\hat{X}+D_i\hat{u}]dW_i,\nonumber \\ \hat{X}(0)&=x_0,\nonumber \\ d\hat{p}_1&=-[A^{\top }\hat{p}_1+\sum _{i=1}^dC_i^{\top }\hat{q}_{1,i}- f_x(t,\hat{X},\hat{u})]dt+\sum _{i=1}^d\hat{q}_{1,i}dW_i,\nonumber \\ \hat{p}_1(T)&=- g_x(\hat{X}(T)), \end{aligned}$$
(5)

satisfies the condition

$$\begin{aligned} H(t,\hat{X}(t),\hat{u}(t),\hat{p}_1(t),\hat{q}_1(t))=\max _{u\in K}H(t,\hat{X}(t),{u},\hat{p}_1(t),\hat{q}_1(t)), \end{aligned}$$
(6)

for \((P\otimes \hbox {Leb})\)-a.e. \((\omega ,t)\in \Omega \times [0,T]\). Moreover, if \( f_u\) exists, then (6) is equivalent to

$$\begin{aligned}{}[\hat{u}-u]^{\top }[B^{\top }\hat{p}_1+\sum _{i=1}^dD_i^{\top }\hat{q}_{1,i}- f_u(t,\hat{X},\hat{u})]\ge 0, \quad \forall u\in K. \end{aligned}$$

The processes \(\hat{p}_1\in \mathcal {S}(0,T;\mathbb {R}^n)\) and \(\hat{q}_{1,i}\in \mathcal {H}(0,T;\mathbb {R}^n)\), \(i=1,\ldots ,d\), satisfy a BSDE, called the adjoint equation associated with the admissible pair \((\hat{X}, \hat{u})\). The proof of Theorem 1 is standard and therefore omitted.

Remark 1

In [5], f is assumed to be \(C^1\) in u as well as in x, which simplifies Assumption 1 with partial derivatives instead of difference quotient, but they are all used in the proofs to ensure the monotone convergence theorem can be applied, while the key ideas and proofs are largely the same, see [5] and [15] for details. Since f is continuous, but not necessarily \(C^1\) in u, we need to use subdifferential in convex analysis to characterize the optimal solution, instead of simple gradient if f is \(C^1\) in u, see examples in Sect. 4. In [5], there is a state constraint \(X(t)\in V\) as well as control constraint \(u(t)\in U\) for all \(t\in [0,T]\). For X satisfying a linear SDE (1), one cannot in general ensure \(X(t)\in V\) for all t, additional conditions are needed for admissible controls u, see (77), (79), etc., in [5]. In contrast, we assume fg are well defined on the whole space and there is no constraint on state process X, so we do not need additional conditions. One drawback of our model is that we cannot deal with the investment–consumption model discussed in [5] as utility functions are only defined on the positive real line, not the whole space, and do not satisfy our assumptions. However, the key objective of our paper is different from that of [5]: We aim to solve the primal problem indirectly with the dual approach when it is too difficult or complicated to solve it directly with the primal SMP, see examples in Sect. 4 where the dual method is used to find the optimal solution, which would otherwise be highly difficult or impossible if one works directly with the primal problem.

We now formulate the dual problem. Since X is driven by Brownian motions \(W_i\), \(i=1,\ldots ,d\), as well as control process u, the dual process Y should satisfy the following SDE:

$$\begin{aligned} dY = \tilde{\alpha }dt + \sum _{i=1}^d \beta _i dW_i \end{aligned}$$

with the initial condition \(Y(0)=y\), where \(\tilde{\alpha },\beta _i\in \mathcal {H}(0,T;\mathbb {R}^n)\) and \(y\in \mathbb {R}^n\) are to be determined. Since X satisfies SDE (1), using Ito’s lemma, we have

$$\begin{aligned}{} & {} d(X^{\top }Y)= [X^{\top }(A^{\top }Y +\tilde{\alpha }+ \sum _{i=1}^d C_i^{\top } \beta _i) + u^{\top }(B^{\top } Y \\{} & {} \quad + \sum _{i=1}^d D_i^{\top } \beta _i) ]dt +\sum _{i=1}^d[X^{\top }\beta _i+Y^{\top }(C_i{X}+D_i{u})] dW_i. \end{aligned}$$

Let \(\alpha =A^{\top }Y +\tilde{\alpha }+ \sum _{i=1}^d C_i^{\top } \beta _i\). Then, the dual process Y satisfies the following SDE:

$$\begin{aligned} dY=[\alpha -A^{\top }Y -\sum _{i=1}^dC_i^{\top }\beta _i]dt+\sum _{i=1}^d\beta _idW_i \end{aligned}$$
(7)

with \(Y(0)=y\), where \(\alpha ,\beta _i\in \mathcal {H}(0,T;\mathbb {R}^n)\) and \(y\in \mathbb {R}^n\) are to be determined. There is a unique solution Y to SDE (7) for given \((y, \alpha ,\beta _1,\ldots ,\beta _d)\). We call \((\alpha ,\beta _1,\ldots ,\beta _d)\) the admissible dual control and \((Y, \alpha ,\beta _1,\ldots ,\beta _d)\) the admissible dual pair. Since

$$\begin{aligned} d(X^{\top }Y)&=[X^{\top }\alpha +u^{\top }\beta ]dt+\sum _{i=1}^d[X^{\top }\beta _i+Y^{\top }(C_i{X}+D_i{u})] dW_i, \end{aligned}$$

where

$$\begin{aligned} \beta =B^{\top }Y+\sum _{i=1}^dD_i^{\top }\beta _i, \end{aligned}$$
(8)

the process \(X^{\top }(t)Y(t)-\int _0^t\left[ X^{\top }\alpha +u^{\top }\beta \right] ds\) is a local martingale, and a supermartingale if it is bounded below by an integrable process, which gives

$$\begin{aligned} E\left[ X(T)^{\top }Y(T)-\int _0^T(X^{\top }\alpha +u^{\top }\beta )ds\right] \le x_0^{\top }y. \end{aligned}$$
(9)

The Problem (3) can be written equivalently as

$$\begin{aligned} \sup _u E\left[ -\int _0^T \tilde{f}(t,X,u)dt-g(X(T)) \right] , \end{aligned}$$

where \(\tilde{f}(t,x,u)=f(t,x,u)+\Psi _K(u)\) and \(\Psi _K(u)=0\) if \(u\in K\) and \(+\infty \) otherwise.

Define dual functions \(\phi : \Omega \times [0,T]\times \mathbb {R}^n\times \mathbb {R}^m\rightarrow \mathbb {R}\) by

$$\begin{aligned} \phi (t,\alpha ,\beta ):=\displaystyle \sup _{x,u}\left\{ x^{\top }\alpha +u^{\top }\beta -\tilde{f}(t,x,u)\right\} \end{aligned}$$
(10)

and \(h:\Omega \times \mathbb {R}^n\rightarrow \mathbb {R}\) by

$$\begin{aligned} h(y):=\displaystyle \sup _{x}\left\{ -x^{\top }y-g(x)\right\} . \end{aligned}$$
(11)

We have \(\phi \) and h are proper closed convex functions [2, Proposition 1.1.6, Proposition 1.6.1].

Combining (9), (10), and (11) yields the following inequality

$$\begin{aligned} \sup _u E\left[ -\int _0^T \tilde{f}(t,X,u)dt-g(X(T)) \right] \le \inf _{y,\alpha ,\beta _1,\ldots ,\beta _d} \left\{ x_0^{\top }y \right. \nonumber \\ \left. +E\left[ \int _0^T\phi (t,\alpha ,\beta )dt+h(Y(T))\right] \right\} . \end{aligned}$$
(12)

The dual control problem is defined by

$$\begin{aligned} \inf _{y,\alpha ,\beta _1,\ldots ,\beta _d} \left\{ x_0^{\top }y+E\left[ \int _0^T\phi (t,\alpha ,\beta )dt+h(Y(T))\right] \right\} , \end{aligned}$$
(13)

where Y satisfies SDE (7) and \( \beta \) is given by (8). We can solve (13) in two steps: First, for fixed y, solve a stochastic control problem:

$$\begin{aligned} V(y):= \inf _{\alpha ,\beta _1,\ldots ,\beta _d} E\left[ \int _0^T\phi (t,\alpha ,B^{\top }Y+\sum _{i=1}^dD_i^{\top }\beta _i)dt+h(Y(T))\right] , \end{aligned}$$

and, second, solve a finite-dimensional optimization problem:

$$\begin{aligned} \inf _y \left\{ x_0^{\top }y+V(y)\right\} . \end{aligned}$$
(14)

Remark 2

If inequality (12) holds as an equality, then there is no duality gap and solving the dual problem is equivalent to solving the primal problem. The dual problem can be more difficult as well as easier than the primal one. No matter the dual problem can be solved or not, it would always provide useful information on the bounds of the value function. From (12), we have a lower bound

$$\begin{aligned} \inf _u E\left[ \int _0^T \tilde{f}(t,X,u)dt+g(X(T)) \right] \ge -\left( x_0^{\top }y+E\left[ \int _0^T\phi (t,\alpha ,\beta )dt+h(Y(T))\right] \right) \end{aligned}$$

as well as an obvious upper bound

$$\begin{aligned} \inf _u E\left[ \int _0^T \tilde{f}(t,X,u)dt+g(X(T)) \right] \le E\left[ \int _0^T \tilde{f}(t,X,u)dt+g(X(T)) \right] \end{aligned}$$

for all admissible controls u and \(y,\alpha ,\beta _1,\ldots \beta _d\). If one can make the gap between the lower and upper bounds sufficiently small, then one has found a good approximation to the value function and the optimal control. Note that it would be impossible to get the lower bound without the dual formulation, see [15, 25] for detailed discussions and applications in mathematical finance.

The Hamiltonian \(\tilde{H}:\Omega \times [0,T]\times \mathbb {R}^n\times \mathbb {R}^n\times \mathbb {R}^{nd}\times \mathbb {R}^n\times \mathbb {R}^{nd}\rightarrow \mathbb {R}\) for the dual problem is defined by

$$\begin{aligned} \tilde{H}(t, y, \alpha ,\beta _1,\ldots ,\beta _d,p_2,q_2):=&p_2^{\top }(\alpha -A^{\top } y- \sum _{i=1}^d C_i^{\top }\beta _i) \nonumber \\&+\sum _{i=1}^dq_{2,i}^{\top }\beta _i-\phi (t, \alpha ,B^{\top }y+\sum _{i=1}^dD_i^{\top }\beta _i), \end{aligned}$$
(15)

where \(p_2\in \mathbb {R}^n\) and \(q_2:=(q_{2,1}, \ldots , q_{2, d})\) and \(q_{2,i}\in \mathbb {R}^n\).

To state the SMP for the dual problem, we need a similar assumption to that of the primal problem.

Assumption 2

Let \((\hat{Y}, \hat{\alpha },\hat{\beta }_1,\ldots ,\hat{\beta }_d)\) be a given admissible dual pair. There exist \(Z\in \mathcal {P}(0,T;\mathbb {R})\) and an \(\mathcal {F}_T\)-measurable random variable \(\tilde{Z}\) satisfying \(E[\int _0^T|Z(t)|dt]<\infty \), \(E[|\tilde{Z}|]<\infty \) such that for any admissible dual pair \((Y, \alpha ,{\beta }_1,\ldots ,{\beta }_d)\),

$$\begin{aligned} Z(t)&\ge \frac{\phi (t,\hat{\alpha }+\epsilon \alpha ,\hat{\beta }+\epsilon \beta )-\phi (t,\hat{\alpha },\hat{\beta })}{\epsilon },\\ \tilde{Z}&\ge \frac{h(\hat{Y}(T)+\epsilon Y(T))-h(\hat{Y}(T))}{\epsilon } \end{aligned}$$

for \((P\otimes \text {Leb})\)-a.e. \((\omega ,t)\in \Omega \times [0,T]\) and \(\epsilon \in (0,1]\). Furthermore, h is \(C^1\) and satisfies \(E[| h_y(Y(T))|^2]<\infty \).

A sufficient condition for Assumption 2 to hold is that \(\phi ,h\) are \(C^1\) and \(|\phi _\alpha (t,\alpha , \beta )|+|\phi _\beta (t,\alpha , \beta )|\le C(1+|\alpha |+|\beta |)\) and \(| h_y(y)|\le C(1+|y|)\) for all \(t,\alpha ,\beta , y\) and some constant C.

Remark 3

Assumptions 1 and 2 are equivalent and can be derived from each other if we impose some additional conditions. For example, if fg are \(C^2\) with bounded second derivatives and K is the whole space and

$$\begin{aligned} \left( \begin{array}{ll} f_{xx}&{} f_{xu}\\ f_{ux}&{} f_{uu} \end{array} \right) (t,x,u)\ge cI_{n+m}, \quad g_{xx}(x)\ge cI_n \end{aligned}$$

for all txu and some positive constant c, where \(I_n, I_{n+m}\) are identity matrices, then both Assumptions 1 and 2 are satisfied. It is easy to see Assumption 1 holds as bounded second derivatives imply first order derivatives have linear growth. To see Assumption 2 holds, note that by definition, \(h(y)=-\bar{x}^{\top }y-g(\bar{x})\), where \(\bar{x}\) is the maximum point of \(-x^{\top }y-g(x)\) over all x and satisfies the equation \(-y-g_x(\bar{x})=0\), which gives \(h_y(y)=-\bar{x}\). Furthermore, we have \(-I_n - g_{xx}(\bar{x}){\partial \bar{x}\over \partial y}=0\); combining with \(g_{xx}(x)\ge cI_n\) for all x, we have \({\partial \bar{x}\over \partial y}=- g_{xx}(\bar{x})^{-1}\), a strictly negative definite matrix with bounded norm, which implies \(h_y\) has linear growth. The linear growth property of \(\phi _\alpha , \phi _\beta \) can be proved similarly. For general functions fg and set K, it is less clear if Assumptions 1 and 2 are equivalent, but they are clearly related as \(\phi ,f\) and hg are conjugate functions to each other.

If \(\phi \) is \(C^1\) in \(\beta \), and under Assumption 2, then the adjoint equation associated with the dual problem is given by

$$\begin{aligned} dp_2= & {} -\tilde{H}_y(t, Y, \alpha ,\beta _1,\ldots ,\beta _d,p_2,q_2)dt+\sum _{i=1}^dq_{2,i}dW_i,\nonumber \\= & {} [Ap_2+B \phi _\beta (t,\alpha ,B^{\top }Y+\sum _{i=1}^dD_i^{\top }\beta _i)]dt+\sum _{i=1}^dq_{2,i}dW_i,\nonumber \\ p_2(T)= & {} - h_y(Y(T)). \end{aligned}$$
(16)

We can characterize the dual optimal control \((\hat{\alpha },\hat{\beta }_1,\ldots ,\hat{\beta }_d)\) with SDE (7) and BSDE (16) and the maximum condition

$$\begin{aligned}&\tilde{H}(t,\hat{Y}(t), \hat{\alpha }(t),\hat{\beta }_1(t),\ldots ,\hat{\beta }_d(t),\hat{p}_2(t),\hat{q}_2(t))\\&\quad \quad =\max _{\alpha ,\beta _1,\ldots ,\beta _d}\tilde{H}(t,\hat{Y}(t),\alpha ,\beta _1,\ldots ,\beta _d,\hat{p}_2(t),\hat{q}_2(t)) \end{aligned}$$

and \(\hat{y}\) is determined from (14).

Remark 4

If fg in (2) are strictly convex quadratic functions and K is the whole space, then \(\phi , h\) are also strictly convex quadratic functions. The optimal primal and dual controls can be expressed as affine functions of their corresponding state and adjoint processes, and the primal and dual FBSDEs can be simplified to fully coupled linear FBSDEs with random coefficients and the relation of their solutions can be explicitly specified, see [23] for details on solvability of linear FBSDEs. If all coefficients of the model are deterministic, then these linear FBSDEs can be further reduced to equivalent Riccati ordinary differential equations and their solutions can be recovered from each other.

3 Transformed Dual Problem and Primal–Dual Relation

The BSDE (16) for the dual problem requires \(\phi \) to be differentiable in \(\beta \). If that condition is not satisfied, then (16) is not well defined in the usual sense of BSDEs. One may try to extend the definition of BSDE and replace the derivative with a set-valued mapping as commonly used in deterministic nonsmooth control and optimization, see [6, 22], and also [1] for some recent work on set-valued BSDE, but this is far beyond the scope of this paper. We instead to focus on solving the dual problem with a transformation method for nonsmooth function \(\phi \).

The key reason we need \(\phi \) to be differentiable in \(\beta \) is that \(\beta \) defined in (8) depends on Y and the adjoint equation (16) involves the differentiation of the dual Hamiltonian \(\tilde{H}\) in (15) with respect to state variable y. If we can change \(\beta \) to a control variable, independent of Y, then the differentiability issue of \(\phi \) would disappear. This simple idea leads us to reformulate the dual problem to an equivalent one with different dual controls.

We replace one of dual controls \(\beta _i\) by \(\beta \) and need a condition on \(D_i(t)\in \mathbb {R}^{n\times m}\) to do that. Without loss of generality, we choose \(i=d\) and assume the following condition:

Assumption 3

\(n\le m\), \(\hbox {rank}(D_d(t))=n \), and \(D_d^{\dagger }(t):=D_d^{\top }(t)(D_d(t)D_d^{\top }(t))^{-1}\) is uniformly bounded for \(0\le t\le T\).

\( D_d^{\dagger }\in \mathbb {R}^{m\times n}\) is the Moore–Penrose inverse of \(D_d\) and satisfies \(D_dD_d^{\dagger }=I_n\). From (8), we then obtain

$$\begin{aligned} \beta _d=(D_d^{\dagger })^{\top }(\beta -B^{\top }Y-\sum _{i=1}^{d-1}D_i^{\top }{\beta }_i). \end{aligned}$$
(17)

Using (7) and (17), the dual process Y satisfies the following SDE:

$$\begin{aligned} dY= & {} [\alpha -A^{\top }Y-\sum _{i=1}^{d-1}C_i^{\top }{\beta }_i-C_d^{\top }(D_d^{\dagger })^{\top }(\beta -B^{\top }Y-\sum _{i=1}^{d-1}D_i^{\top }{\beta }_i)]dt\nonumber \\{} & {} +\sum _{i=1}^{d-1}{\beta }_idW_i+(D_d^{\dagger })^{\top }(\beta -B^{\top }Y-\sum _{i=1}^{d-1}D_i^{\top }{\beta }_i)dW_d,\quad Y(0)=y. \end{aligned}$$
(18)

Due to Assumption 3 and the uniform boundedness of the primal-state coefficients, there exists a unique solution \(Y\in \mathcal {S}(0,T;\mathbb {R}^n)\), see [24, Theorem 1.6.16]. The dual problem (13) is equivalent to

$$\begin{aligned} \hbox { Minimize}\ \tilde{\Psi }(y,\alpha ,{\beta }_1,\ldots ,{\beta }_{d-1},\beta ):=x_0^{\top }y+E\left[ \int _0^T\phi (t,\alpha ,\beta )dt+h(Y(T))\right] . \end{aligned}$$
(19)

The adjoint equation associated with \((y,\alpha ,{\beta }_1,\ldots ,{\beta }_{d-1},\beta )\) and Y in (18) is given by

$$\begin{aligned} dp_2&=[(A-BD_d^{\dagger }C_d)p_2+BD_d^{\dagger }q_{2,d}]dt+\sum _{i=1}^dq_{2,i}dW_i,\nonumber \\ p_2(T)&=- h_y(Y(T)). \end{aligned}$$

Due to Assumption 2 and the uniform boundedness of the primal-state coefficients, there exists a unique solution \(p_2\in \mathcal {S}(0,T;\mathbb {R}^n)\), \(q_{2,i}\in \mathcal {H}(0,T;\mathbb {R}^n)\), \(i=1,\ldots ,d\), see [24, Theorem 7.2.2].

The next theorem states the SMP for the transformed dual problem (19).

Theorem 2

Let \((\hat{y},\hat{\alpha },\hat{\beta }_1,\ldots ,\hat{\beta }_{d-1},\hat{\beta })\) be admissible dual controls. Then, \((\hat{y},\hat{\alpha },\hat{\beta }_1,\ldots ,\hat{\beta }_{d-1},\hat{\beta })\) is optimal for the dual problem (19) if and only if the solution \((\hat{Y},\hat{p}_2,\hat{q}_2)\) of the FBSDE

$$\begin{aligned} d\hat{Y}&=[\hat{\alpha }-A^{\top }\hat{Y}-\sum _{i=1}^{d-1}C_i^{\top }\hat{\beta }_i-C_d^{\top }(D_d^{\dagger })^{\top }(\hat{\beta }-B^{\top }\hat{Y}-\sum _{i=1}^{d-1}D_i^{\top }\hat{\beta }_i)]dt\nonumber \\&\quad +\sum _{i=1}^{d-1}\hat{\beta }_idW_i+(D_d^{\dagger })^{\top }(\hat{\beta }-B^{\top }\hat{Y}-\sum _{i=1}^{d-1}D_i^{\top }\hat{\beta }_i)dW_d,\nonumber \\ \hat{Y}(0)&=\hat{y},\nonumber \\ d\hat{p}_2&=[(A-BD_d^{\dagger }C_d)\hat{p}_2+BD_d^{\dagger }\hat{q}_{2,d}]dt+\sum _{i=1}^d\hat{q}_{2,i}dW_i,\nonumber \\ \hat{p}_2(T)&=-h_y(\hat{Y}(T)) \end{aligned}$$
(20)

satisfies the conditions

$$\begin{aligned}{} & {} \hat{p}_2(0)=x_0, \nonumber \\{} & {} (\hat{p}_2,D_d^{\dagger }\hat{q}_{2,d}-D_d^{\dagger }C_d\hat{p}_2)\in \partial \phi (t,\hat{\alpha },\hat{\beta }),\nonumber \\{} & {} D_d^{\dagger }\hat{q}_{2,d}-D_d^{\dagger }C_d\hat{p}_2\in K,\nonumber \\{} & {} D_iD_d^{\dagger }C_d\hat{p}_2-C_i\hat{p}_2+\hat{q}_{2,i}-D_iD_d^{\dagger }\hat{q}_{2,d}=0,\quad \forall i=1,\ldots ,d-1, \end{aligned}$$
(21)

for \((P\otimes \hbox {Leb})\)-a.e. \((\omega ,t)\in \Omega \times [0,T]\), where \(\partial \phi (t,\hat{\alpha },\hat{\beta })\) is the subdifferential of \(\phi (t, \cdot , \cdot )\) at \((\hat{\alpha }(t),\hat{\beta }(t))\).

Proof

See Appendix. \(\square \)

We next state the results on primal–dual relation. We first make the following assumption:

Assumption 4

The function \(g_x(\omega ,\cdot ):\mathbb {R}^n\rightarrow \mathbb {R}^n\) is a bijection for any \(\omega \) such that \(z=-g_x(x)\) if and only if \(x=-h_y(z)\); that is, the inverse function of \(-g_x\) is \(-h_y\).

We can recover the primal optimal solution from that of the dual problem.

Theorem 3

Suppose \((\hat{y},\hat{\alpha },\hat{\beta }_1,\ldots ,\hat{\beta }_{d-1},\hat{\beta })\) is optimal for the dual problem (19). Let \((\hat{Y},\hat{p}_2,\hat{q}_2)\) be the associated state and adjoint processes in Theorem 2. Define

$$\begin{aligned} \hat{u}(t):=D_d^{\dagger }(t)\hat{q}_{2,d}(t)-D_d^{\dagger }(t)C_d(t)\hat{p}_2(t), \quad t\in [0,T]. \end{aligned}$$
(22)

Then, \(\hat{u}\) is the optimal control for the primal problem (3). For \(t\in [0,T]\), the optimal state and associated adjoint processes satisfy

$$\begin{aligned}{} & {} \hat{X}(t)=\hat{p}_2(t),\nonumber \\{} & {} \hat{p}_1(t)=\hat{Y}(t),\nonumber \\{} & {} \hat{q}_{1,i}(t)=\hat{\beta }_i(t),\quad \forall i=1,\ldots ,d-1,\nonumber \\{} & {} \hat{q}_{1,d}(t)=(D_d^{\dagger })^{\top }(t)(\hat{\beta }(t)-B^{\top }(t)\hat{Y}(t)-\sum _{i=1}^{d-1}D_i^{\top }(t)\hat{\beta }_i(t)). \end{aligned}$$
(23)

Proof

Suppose that \((\hat{y},\hat{\alpha },\hat{\beta }_1,\ldots ,\hat{\beta }_{d-1},\hat{\beta })\) is optimal for the dual problem. By Theorem 2, the process \((\hat{Y},\hat{p}_2,\hat{q}_2)\) solves FBSDE (20) and satisfies conditions (21).

Define \(\hat{u}(t)\) and \((\hat{X}(t),\hat{p}_1(t),\hat{q}_1(t))\) as in (22) and (23), respectively. From Theorem 2 and conditions (21),

$$\begin{aligned} \hat{u}(t)=D_d^{\dagger }(t)\hat{q}_{2,d}(t)-D_d^{\dagger }(t)C_d(t)\hat{p}_2(t)\in K,\quad P\text {-a.s.} \end{aligned}$$

and

$$\begin{aligned} (\hat{X}(t),\hat{u}(t))=(\hat{p}_2(t),D_d^{\dagger }(t)\hat{q}_{2,d}(t)-D_d^{\dagger }(t)C_d(t)\hat{p}_2(t))\in \partial \phi (t, \hat{\alpha }(t),\hat{\beta }(t)), \end{aligned}$$

which is equivalent to

$$\begin{aligned} (\hat{\alpha }(t),\hat{\beta }(t))\in \partial \tilde{f}(t, \hat{X}(t),\hat{u}(t)). \end{aligned}$$

Since \(\tilde{f}(t,x,u)=f(t,x,u)+\Psi _K(u)\) and f is \(C^1\) in x, we have

$$\begin{aligned} \hat{\alpha }=f_x(t,\hat{X},\hat{u}), \quad \hat{\beta }&\in \partial _u f(t,\hat{X},\hat{u})+N_K(\hat{u}) \end{aligned}$$
(24)

for \((P\otimes \text {Leb})\)-a.e. \((\omega ,t)\in \Omega \times [0,T]\), where \(\partial _uf(t,\hat{X},\hat{u})\) is the subdifferential of f with respect to u at \((t,\hat{X}(t),\hat{u}(t))\) and \(N_K(\hat{u}(t))=\{p\in \mathbb {R}^m:p^{\top }(u-\hat{u}(t))\le 0,\forall u\in K\}\) is the normal cone of K at \(\hat{u}(t)\).

Using the last condition in (21) and (22) yields

$$\begin{aligned} \hat{q}_{2,i}&=D_iD^{\dagger }_d\hat{q}_{2,d}-D_iD_d^{\dagger }C_d\hat{p}_2 +C_i\hat{p}_2=D_i\hat{u}+C_i\hat{p}_2. \end{aligned}$$
(25)

Combining (22), (23), (24), and (25) yields

$$\begin{aligned} d\hat{X}&=d\hat{p}_2\\&=[(A-BD_d^{\dagger }C_d)\hat{p}_2+BD_d^{\dagger }\hat{q}_{2,d}]dt+\sum _{i=1}^d\hat{q}_{2,i}dW_i\\&=[A\hat{X}+B\hat{u}]dt+\sum _{i=1}^{d}[C_i\hat{X}+D_i\hat{u}]dW_i \end{aligned}$$

and

$$\begin{aligned} d\hat{p}_1&=d\hat{Y}\\&=[\hat{\alpha }-A^{\top }\hat{Y}-\sum _{i=1}^{d-1}C_i^{\top }\hat{\beta }_i-C_d^{\top }(D_d^{\dagger })^{\top }(\hat{\beta }-B^{\top }\hat{Y}-\sum _{i=1}^{d-1}D_i^{\top }\hat{\beta }_i)]dt\\&\quad +\sum _{i=1}^{d-1}\hat{\beta }_idW_i+(D_d^{\dagger })^{\top }(\hat{\beta }-B^{\top }\hat{Y}-\sum _{i=1}^{d-1}D_i^{\top }\hat{\beta }_i)dW_d\\&=[f_x(t,\hat{X},\hat{u})-A^{\top }\hat{p}_1-\sum _{i=1}^{d}C_i^{\top }\hat{q}_{1,i}]dt+\sum _{i=1}^{d}\hat{q}_{1,i}dW_i. \end{aligned}$$

We check whether the initial condition \(\hat{X}(0)=x_0\) and terminal condition \(\hat{p}_1(T)=-g_x(\hat{X}(T))\) are satisfied. From the first condition in (21), \(\hat{p}_2(0)=x_0\). Since the inverse function of \(-g_x\) is \(-h_y\) via Assumption 4, then

$$\begin{aligned} -h_y(\hat{Y}(T))=\hat{p}_2(T)=\hat{X}(T), \end{aligned}$$

which implies that \(\hat{Y}(T)=-g_x(\hat{X}(T))\). Hence, \((\hat{X},\hat{p}_1,\hat{q}_1)\) solves the primal FBSDE (5).

Combining (23) and (24) yields

$$\begin{aligned} B^{\top } \hat{p}_1+\sum _{i=1}^{d}D_i^{\top }\hat{q}_{1,i}&=B^{\top } \hat{Y}+\sum _{i=1}^{d-1}D_i^{\top }\hat{\beta }_i+D_d^{\top }\hat{q}_{1,d}=\hat{\beta }\in \partial _u \tilde{f}(t,\hat{X},\hat{u}), \end{aligned}$$

for \((P\otimes \text {Leb})\)-a.e. \((\omega ,t)\in \Omega \times [0,T]\), that is,

$$\begin{aligned} 0 \in -(B^{\top } \hat{p}_1+\sum _{i=1}^{d}D_i^{\top }\hat{q}_{1,i}) + \partial _u f(t,\hat{X},\hat{u}) + N_K(\hat{u}), \end{aligned}$$

which shows \(\hat{u}\) is the minimum point of \(-H(t,\hat{X},u,\hat{p}_1,\hat{q}_1)\) over \(u\in K\). Hence, condition (6) is satisfied. Using Theorem 1, \(\hat{u}\) is optimal for the primal problem. \(\square \)

We can also recover the dual optimal solution from that of the primal problem.

Theorem 4

Suppose that \(\hat{u}\in \mathcal {A}\) is optimal for the primal problem (3). Let \((\hat{X},\hat{p}_1,\hat{q}_1)\) be the associated state and adjoint processes in Theorem 1. Define

$$\begin{aligned}{} & {} \hat{y}=\hat{p}_1(0),\nonumber \\{} & {} \hat{\alpha }(t)=f_x(t,\hat{X}(t),\hat{u}(t)),\nonumber \\{} & {} \hat{\beta }_i(t)=\hat{q}_{1,i}(t),\quad \forall i=1,\ldots ,d-1 ,\nonumber \\{} & {} \hat{\beta }(t)=B^{\top }(t)\hat{p}_1(t)+\sum _{i=1}^{d}D_i^{\top }(t)\hat{q}_{1,i}(t). \end{aligned}$$
(26)

Then, \((\hat{y},\hat{\alpha },\hat{\beta }_1,\ldots ,\hat{\beta }_{d-1},\hat{\beta })\) is the optimal control of the dual problem (19). For \(t\in [0,T]\), the optimal dual-state process and associated adjoint processes satisfy

$$\begin{aligned}{} & {} \hat{Y}(t)=\hat{p}_1(t),\nonumber \\{} & {} \hat{p}_2(t)=\hat{X}(t),\nonumber \\{} & {} \hat{q}_{2,i}(t)=D_i(t)\hat{u}(t)+C_i(t)\hat{X}(t),\quad \forall i=1,\ldots ,d-1,\nonumber \\{} & {} D_d^{\dagger }(t)\hat{q}_{2,d}(t)=\hat{u}(t)+D_d^{\dagger }(t)C_d(t)\hat{X}(t). \end{aligned}$$
(27)

Proof

Suppose that \(\hat{u}\in \mathcal {A}\) is optimal for the primal problem. By Theorem 1, the process \((\hat{X},\hat{p}_1,\hat{q}_1)\) solves the primal FBSDE (5) and satisfies condition (6).

Define \((\hat{y},\hat{\alpha },\hat{\beta }_1,\ldots ,\hat{\beta }_{d-1},\hat{\beta })\) and \((\hat{Y},\hat{p}_2,\hat{q}_2)\) as in (26) and (27), respectively. Then,

$$\begin{aligned} d\hat{Y}&=d\hat{p}_1\\&=-[A^{\top }\hat{p}_1+\sum _{i=1}^dC_i^{\top }\hat{q}_{1,i}-f_x(t,\hat{X},\hat{u})]dt+\sum _{i=1}^d\hat{q}_{1,i}dW_i\\&=-[A^{\top }\hat{Y}+\sum _{i=1}^{d-1}C_i^{\top }\hat{\beta }_i+C_d^{\top }(D_d^{\dagger })^{\top }(\hat{\beta }-B^{\top }\hat{Y}-\sum _{i=1}^{d-1}D_i^{\top }\hat{\beta }_i)-\hat{\alpha }]dt\\&\qquad +\sum _{i=1}^{d-1}\hat{\beta }_{i}dW_i+(D_d^{\dagger })^{\top }(\hat{\beta }-B^{\top }\hat{Y}-\sum _{i=1}^{d-1}D_i^{\top }\hat{\beta }_i)dW_d \end{aligned}$$

and

$$\begin{aligned} d\hat{p}_2&=d\hat{X}\\&=[A\hat{X}+B\hat{u}]dt+\sum _{i=1}^d[C_i\hat{X}+D_i\hat{u}]dW_i\\&=[A\hat{p}_2-BD_d^{\dagger }C_d\hat{p}_2+BD_d^{\dagger }\hat{q}_{2,d}]dt+\sum _{i=1}^d\hat{q}_{2,i}dW_i. \end{aligned}$$

We check whether the initial condition \(\hat{Y}(0)=\hat{y}\) and terminal condition \(\hat{p}_2(T)=-h_y(\hat{Y}(T))\) are satisfied. From the first definition in (26), \(\hat{y}=\hat{p}_1(0)\). Since \(-g_x(\hat{X}(T))=\hat{p}_1(T)=\hat{Y}(T)\), from Assumption 4, \(\hat{X}(T)=-h_y(\hat{Y}(T))\). Hence, \((\hat{Y},\hat{p}_2,\hat{q}_2)\) solves the dual FBSDE (20).

From (27),

$$\begin{aligned} \hat{p}_2(0)=\hat{X}(0)=x_0 \end{aligned}$$

and

$$\begin{aligned} D_d^{\dagger }(t)\hat{q}_{2,d}(t)-D_d^{\dagger }(t)C_d(t)\hat{p}_2(t)=\hat{u}(t)\in K, \end{aligned}$$

which are the first two conditions of (21). Using condition (6), the concavity of H defined in (4) and (26), we have,

$$\begin{aligned} 0\in \partial _u(-H(t,\hat{X},u,\hat{p}_1,\hat{q}_1))= -(B^{\top } \hat{p}_1+\sum _{i=1}^{d}D_i^{\top }\hat{q}_{1,i}) + \partial _u \tilde{f}(t,\hat{X},\hat{u}), \end{aligned}$$

which implies that \(\hat{\beta }\in \partial _u\tilde{f}(\hat{X}(t),\hat{u}(t))\). Consequently, from the second definition in (26),

$$\begin{aligned} (\hat{\alpha }(t),\hat{\beta }(t))\in \partial \tilde{f}(t, \hat{X}(t),\hat{u}(t)), \end{aligned}$$

which, due to \(\tilde{f}\) being a proper closed convex function, is equivalent to

$$\begin{aligned} (\hat{p}_2(t),D_d^{\dagger }(t)\hat{q}_{2,d}(t)-D_d^{\dagger }(t)C_d(t)\hat{p}_2(t))\in \partial \phi (t, \hat{\alpha }(t),\hat{\beta }(t)), \end{aligned}$$

the third condition of (21). The fourth condition of (21) is immediate from the definition of \(\hat{q}_{2,i}(t)\) and \(\hat{q}_{2,d}(t)\) in (27). \(\square \)

4 Examples

In this section, we construct some multidimensional examples to show that solving the primal problem via its dual formulation is easier than solving it directly. Each example has at least one of the following features: control constraint, nonsmooth running cost, and random coefficients.

Assume \(2\le n<m\), \(d=1\), \(\hbox {rank}(D(t))=n\), the Moore–Penrose inverse of D(t) is given by \(D^{\dagger }(t)=D(t)^{\top }(D(t)D(t)^{\top })^{-1}\), \(B, D, D^{\dagger }\) are uniformly bounded processes, and X satisfies the SDE

$$\begin{aligned} dX&=\left[ AX+Bu\right] dt+\left[ CX+Du\right] dW,\; t\in [0,T]\nonumber \\ X(0)&=x_0\in \mathbb {R}^n, \end{aligned}$$
(28)

where \(A=-\frac{1}{2}BD^{\dagger }(D^{\dagger })^{\top }B^{\top }\), \(C=-(D^{\dagger })^{\top }B^{\top }\), \(u(t)\in K\), a closed convex set in \(\mathbb {R}^m\). Consider the following problem

$$\begin{aligned} \hbox {Minimize } J(u):=E\left[ \int _0^Tf(u(t))dt+\frac{1}{2}X(T)^{\top }X(T)\right] . \end{aligned}$$
(29)

We suppress the time variable t from now on for simplicity of notation. This is a special case of model (1) with \(C_d=C\), \(D_d=D\), \(D^{\dagger }_d=D^{\dagger }\), \(W_d=W\), \(g(x)=\frac{1}{2}x^{\top }x\), and f is a convex function.

We assume the following condition for the coefficients of SDE (28):

Assumption 5

The matrix B satisfies \(B-BD^{\dagger }D\ne 0\).

Remark 5

Assumption 5 implies that \(B\ne D\) as otherwise \(B-BD^{\dagger }D=0\) from the property of the Moore–Penrose inverse. Since D has full row rank n, \(DD^{\dagger }=I_n\). We also know that \(D^{\dagger } D \ne I_m\), which can be easily proved as follows: Since \(n<m\), the columns of D are linearly dependent and there exists a nonzero vector \(z\in \mathbb {R}^m\) such that \(Dz=0\). Now, assume \(D^{\dagger } D = I_m\), then \(z=D^{\dagger }D z =0\), a contradiction, and therefore, \(D^{\dagger } D \ne I_m\).

We may attempt several methods to solve (29). The first one is to solve (29) directly via the cost functional. Using Itô’s formula to \(X^{\top }X\) yields

$$\begin{aligned} d[X^{\top }X]&=[2X^{\top }(B-BD^{\dagger }D)u+u^{\top }D^{\top }Du]dt+\hbox {martingale}. \end{aligned}$$

Hence,

$$\begin{aligned} J(u)&=\frac{1}{2}x_0^{\top }x_0+E\left[ \int _0^T[f(u)+X^{\top }(B-BD^{\dagger }D)u+\frac{1}{2}u^{\top }D^{\top }Du]dt\right] . \end{aligned}$$

The second one is to use the SMP and maximize the Hamiltonian over \(u\in K\). Write \(q_1:=q_{1,d}\). The Hamiltonian \(H:\Omega \times [0,T]\times \mathbb {R}^n\times \mathbb {R}^m\times \mathbb {R}^n\times \mathbb {R}^n\rightarrow \mathbb {R}\) for the primal problem is given by

$$\begin{aligned} H(\omega ,t,x,u, p_1, q_1):=[Ax+Bu]^{\top }p_1+[Cx+Du]^{\top }q_1-f(u), \end{aligned}$$
(30)

where \((p_1,q_1)\) satisfies the adjoint equation, given by

$$\begin{aligned}{} & {} dp_1=-[A^{\top }p_1+C^{\top }q_{1}]dt+q_{1}dW,\nonumber \\{} & {} p_1(T)=-X(T). \end{aligned}$$
(31)

The control \(\hat{u}\in K\) is optimal if and only if

$$\begin{aligned} H(t,\hat{X},\hat{u}, p_1, q_1)&=\max _{u\in K}H(t,\hat{X},u, p_1, q_1). \end{aligned}$$
(32)

The third one is to apply the dynamic programming principle when all coefficients are deterministic. Define the value function v as

$$\begin{aligned} v(t,x)&=\displaystyle \inf _{u\in \mathcal {A}[t,T]}E\left[ \int _t^Tf(u(s))ds+\frac{1}{2}X(T)^{\top }X(T)\Big | X(t)=x\right] ,\\&\quad (t,x)\in [0,T)\times \mathbb {R}^n, \end{aligned}$$

where \(\mathcal {A}[t,T]:=\{u\in \mathcal {H}(t,T;\mathbb {R}^m):u(s)\in K\hbox { for } s\in [t,T], a.e.\}\). The HJB equation is given by

$$\begin{aligned}{} & {} v_t(t,x)+\inf _{u\in K}\left\{ v_x(t,x)^{\top }(Ax+Bu)+\frac{1}{2} (Cx+Du)^{\top }v_{xx}(t,x)(Cx+Du)+f(u)\right\} =0,\nonumber \\{} & {} v(T,x)=g(x). \end{aligned}$$
(33)

The fourth one is to solve the reformulated dual problem (19). The dual functions \(\phi (t,\alpha ,\beta )\) and h(y), defined in (10) and (11), are given by

$$\begin{aligned} \phi (t,\alpha ,\beta )&=\sup _{x,u\in K}\left\{ x^{\top }\alpha +u^{\top }\beta -f(u)\right\} , \end{aligned}$$

and

$$\begin{aligned} h(y)&=\displaystyle \sup _{x}\left\{ -x^{\top }y-\frac{1}{2}x^{\top }x\right\} =\frac{1}{2}y^{\top }y. \end{aligned}$$

Since there are no constraints on the state process and the running cost is free of the state variable, the function \(\phi (t,\alpha ,\beta )=+\infty \) if \(\alpha \ne 0\). To make the dual objective function finite, we must have \(\alpha =0\). We then write

$$\begin{aligned} \phi (\beta ):=\phi (t,0,\beta )&=\sup _{u}\left\{ u^{\top }\beta -\tilde{f}(u)\right\} . \end{aligned}$$

The dual-state process Y satisfies the SDE (see (18))

$$\begin{aligned} d{Y}=\left[ - A^{\top }Y - C^{\top } (D^{\dagger })^{\top } (\beta - B^{\top } Y)\right] dt+(D^{\dagger })^{\top }[\beta -B^{\top }{Y}]dW,\nonumber \\ {Y}(0)={y}, \end{aligned}$$
(34)

and the dual problem is defined by (see (19))

$$\begin{aligned} \hbox {Minimize }\tilde{\Psi }(y,{\beta })&=x_0^{\top }y+E\left[ \int _0^T\phi (\beta (t))dt+\frac{1}{2}Y(T)^{\top }Y(T)\right] . \end{aligned}$$

Using Itô’s formula to \(Y^{\top }Y\) yields

$$\begin{aligned} d[Y^{\top }Y]&=\beta ^{\top }D^{\dagger }(D^{\dagger })^{\top }\beta dt+\hbox {martingale}. \end{aligned}$$

The dual objective function \(\tilde{\Psi }\) can then be written as

$$\begin{aligned} \tilde{\Psi }(y,{\beta })&=x_0^{\top }y+\frac{1}{2}y^{\top }y+E\left[ \int _0^T(\phi (\beta )+\frac{1}{2}\beta ^{\top }D^{\dagger }(D^{\dagger })^{\top }\beta )dt\right] . \end{aligned}$$
(35)

We next discuss different forms of f and K and show the usefulness of the dual formulation in finding the optimal solutions. Denote by \(u=(u_1,\ldots ,u_m)^{\top }\in \mathbb {R}^m\) and \(y^+=\max (0,y)\).

4.1 Zero Running Cost and Control Constraint

Assume \(f(u)=0\) and \(K=[-1,1]^m\). There is no running cost, but there is a bounded control constraint set.

We first use the cost functional method. Combining \(n<m\), \(D^{\dagger }D\ne I_m\), and Assumption 5 implies that we cannot immediately infer that the minimum of the cost functional J in (29) can be attained at \(u=0\) due to the presence of the cross term.

We next use the primal SMP method. Since \(f=0\), the Hamiltonian H is a linear function of u. From \(K=[-1,1]^m\) and (32), the optimal control \(\hat{u}=\hbox {sgn}(B^{\top }p_1+D^{\top }q_1)\), a bang–bang control. Substituting \(\hat{u}\) into SDE (28) and BSDE (31), we then need to solve a fully coupled nonlinear FBSDE. Moreover, if \(B^{\top }p_1+D^{\top }q_1=0\), then the Hamiltonian H is free of u and does not provide any information for the form of \(\hat{u}\).

We then use the HJB method. However, solving the PDE (33) with an ansatz solution is difficult since it is multidimensional and there is a control constraint. The ansatz method may work if the control does not have any constraint. In the presence of the running cost function f that is not quadratic, the ansatz method is still difficult even if there is no control constraint.

We now try the dual method. The dual function \(\phi \) has the following form

$$\begin{aligned} \phi (\beta )=\displaystyle \sup _{u\in [-1,1]^m}\{u^{\top }\beta \} =\sum _{i=1}^m \sup _{u_i\in [-1,1]} \left\{ u_i\beta _i\right\} =\sum _{i=1}^m|\beta _i|. \end{aligned}$$

Note that \(h, \phi \) satisfy Assumption 2. The minimum of the dual objective function \(\tilde{\Psi }\) in (35) is clearly attained uniquely at \(y=-x_0\) and \(\beta =0\). Hence, \((\hat{y},\hat{\beta })=(-x_0,0)\) is the dual optimal control. By Theorem 2, the solution \((\hat{Y},\hat{p}_2,\hat{q}_2)\) to the following dual FBSDE

$$\begin{aligned}{} & {} d\hat{Y}=-\frac{1}{2}BD^{\dagger }(D^{\dagger })^{\top }B^{\top }\hat{Y}dt-(D^{\dagger })^{\top }B^{\top }\hat{Y}dW,\nonumber \\{} & {} \hat{Y}(0)=\hat{y},\nonumber \\{} & {} d\hat{p}_2=[\frac{1}{2}BD^{\dagger }(D^{\dagger })^{\top }B^{\top }\hat{p}_2+BD^{\dagger }\hat{q}_2]dt+\hat{q}_2dW,\nonumber \\{} & {} \hat{p}_2(T)=-\hat{Y}(T), \end{aligned}$$
(36)

satisfies the conditions

$$\begin{aligned}{} & {} \hat{p}_2(0)=x_0,\nonumber \\{} & {} D^{\dagger }\hat{q}_2+D^{\dagger }(D^{\dagger })^{\top }B^{\top }\hat{p}_2\in \partial \phi (0)=[-1,1]^m. \end{aligned}$$
(37)

The solution to the SDE in (36) is given by (see [24, Theorem 1.6.14])

$$\begin{aligned} \hat{Y}(t)=\Phi _1(t)\hat{y}=-\Phi _1(t)x_0, \end{aligned}$$

where \(\Phi _1(t)\in \mathbb {R}^{n\times n} \) is the unique solution of the following matrix-valued SDE

$$\begin{aligned}{} & {} d\Phi _1=-\frac{1}{2}BD^{\dagger }(D^{\dagger })^{\top }B^{\top }\Phi _1 dt-(D^{\dagger })^{\top }B^{\top }\Phi _1 dW,\nonumber \\{} & {} \Phi _1(0)=I_n. \end{aligned}$$
(38)

Define \(\Phi _2(t)\in \mathbb {R}^{n\times n}\) that satisfies

$$\begin{aligned} d\Phi _2=[\frac{1}{2}BD^{\dagger }(D^{\dagger })^{\top }B^{\top }\Phi _2+(BD^{\dagger })^2\Phi _2]dt+BD^{\dagger }\Phi _2dW,\nonumber \\ \Phi _2(0)=I_n. \end{aligned}$$

Since the primal-state coefficients are uniformly bounded, \(\Phi _1,\Phi _2\in \mathcal {S}(0,T;\mathbb {R}^{n\times n})\). The solution to the BSDE in (36) is given by (see [24, Theorem 7.2.2])

$$\begin{aligned} \hat{p}_2(t)&=-\Phi _2(t)E[\Phi _1^{\top }(T)\hat{Y}(T)|\mathcal {F}_t]=\Phi _2(t)E[\Phi _1^{\top }(T)\Phi _1(T)|\mathcal {F}_t]x_0. \end{aligned}$$

Using Itô’s formula to \(\Phi _1^{\top }\Phi _1\) yields

$$\begin{aligned} d[\Phi _1^{\top }\Phi _1]&=-[\Phi _1^{\top }(D^{\dagger })^{\top }B^{\top }\Phi _1+\Phi _1^{\top }BD^{\dagger }\Phi _1]dW. \end{aligned}$$

Since \(\Phi _1\in \mathcal {S}(0,T;\mathbb {R}^n)\) and B and \(D^{\dagger }\) are uniformly bounded,

$$\begin{aligned} E\left[ (\int _0^T\left| \Phi _1^{\top }(D^{\dagger })^{\top }B^{\top }\Phi _1+\Phi _1^{\top }BD^{\dagger }\Phi _1\right| ^2ds)^{\frac{1}{2}}\right] <\infty . \end{aligned}$$

By the BDG inequality,

$$\begin{aligned} \int _0^t[\Phi _1^{\top }(D^{\dagger })^{\top }B^{\top }\Phi _1+\Phi _1^{\top }BD^{\dagger }\Phi _1]dW(s), \quad 0\le t\le T \end{aligned}$$

is a uniformly integrable martingale, so \(\Phi _1^{\top }\Phi _1\) is a martingale. Using Itô’s formula to \(\Phi _1^{\top }\Phi _2\) yields \( d[\Phi _1^{\top }\Phi _2]=0\); therefore, \(\Phi _1^{\top }(t)\Phi _2(t)=\Phi _1^{\top }(0)\Phi _2(0)=I_n\) and \(\Phi _2(t)=[\Phi _1^{\top }(t)]^{-1}\) for all \(t\in [0,T]\), P-a.s. We then obtain

$$\begin{aligned} \hat{p}_2(t)&=\Phi _2(t)\Phi _1^{\top }(t)\Phi _1(t)x_0=\Phi _1(t)x_0, \end{aligned}$$

which implies that \(\hat{p}_2\) satisfies the following SDE

$$\begin{aligned}&d\hat{p}_2=-\frac{1}{2}BD^{\dagger }(D^{\dagger })^{\top }B^{\top }\hat{p}_2dt-(D^{\dagger })^{\top }B^{\top }\hat{p}_2dW,\nonumber \\&\hat{p}_2(0)=x_0. \end{aligned}$$

Note that the initial condition \(\hat{p}_2(0)=x_0\) is exactly the first condition in (37). Comparing the dynamics above with that of the BSDE in (36), we have \(\hat{q}_2=-(D^{\dagger })^{\top }B^{\top }\hat{p}_2\). Hence,

$$\begin{aligned} D^{\dagger }\hat{q}_2+D^{\dagger }(D^{\dagger })^{\top }B^{\top }\hat{p}_2=0\in [-1,1]^m, \end{aligned}$$

which satisfies the second condition in (37). By Theorem 3, the optimal control for the primal problem is given by

$$\begin{aligned} \hat{u}=D^{\dagger }\hat{q}_2+D^{\dagger }(D^{\dagger })^{\top }B^{\top }\hat{p}_2=0 \end{aligned}$$

and the corresponding state process \(\hat{X}=\hat{p}_2\), that is, \( \hat{X}(t)=\Phi _1(t)x_0\) for \(t\in [0,T]\).

Remark 6

Suppose \(n<m\) and \(B=D\). This does not satisfy Assumption 5. We can immediately infer that the minimum of the cost functional (29) can be attained at \(\hat{u}=0\). The corresponding primal-state process satisfies

$$\begin{aligned}&d\hat{X}=-{1\over 2}\hat{X}dt-\hat{X}dW, \nonumber \\&\hat{X}(0)=x_0. \end{aligned}$$

One can easily check with Itô’s formula that the solution is \(\hat{X}(t)=\exp (-t-W(t))x_0\). Solving the primal problem via the dual problem also yields the same solution.

Remark 7

Suppose \(n=m\) and \(B\ne kD\), \(k\in \mathbb {R}\). Since D is a square matrix with full row rank, we have D is nonsingular and \(D^{\dagger }=D^{-1}\). This does not satisfy Assumption 5 since \(B(I_m-D^{\dagger }D)=0\). We can also immediately infer that the optimal control of the primal problem is \(\hat{u}=0\). The corresponding primal-state process is then

$$\begin{aligned}&d\hat{X}=-\frac{1}{2}BD^{\dagger }(D^{\dagger })^{\top }B^{\top }\hat{X}dt-(D^{\dagger })^{\top }B^{\top }\hat{X}dW,\\&\hat{X}(0)=x_0, \end{aligned}$$

or equivalently, \(\hat{X}(t)=\Phi _1(t)x_0\), which is the same solution obtained via the dual problem.

4.2 Nonsmooth Running Cost and No Control Constraint

Assume \(f(u)=\sum _{i=1}^m[(u_i-1)^++(-u_i-1)^+]\) and \(K=\mathbb {R}^m\). The running cost is a convex nonsmooth function, and there are no control constraints.

Similar to the example in Sect. 4.1, the first method does not work since we cannot immediately infer that the minimum of the cost functional J can be attained at \(u=0\).

From (32), the optimal control \(\hat{u}\in \mathbb {R}^m\) satisfies

$$\begin{aligned}&\max _{u\in \mathbb {R}^m}H(t,\hat{X},u, p_1, q_1)=[-\frac{1}{2}BD^{\dagger }(D^{\dagger })^{\top }B^{\top }\hat{X}+B\hat{u}]^{\top }p_1\\&\quad +[-(D^{\dagger })^{\top }B^{\top }\hat{X}+D\hat{u}]^{\top }q_1 - f(\hat{u}). \end{aligned}$$

Since H is not differentiable in u everywhere, the usual gradient method to find the maximum point does not work here. For each \(i=1,\ldots ,m\), three cases should be dealt with: \(u_i< -1\), \(u_i\in [-1,1]\), and \(u_i > 1\). This implies that the maximization of the Hamiltonian via a combinatorial approach deals with \(3^m\) cases in total.

Although there are no constraints imposed on the control variable, solving the PDE (33) using an ansatz would still be difficult since it is multidimensional and the last term inside the infimum is not differentiable with respect to u.

We now solve the primal problem via its dual problem. Function \(\phi \) has the following form:

$$\begin{aligned} \phi (\beta )&= \sup _{u}\{u^{\top }\beta -f(u)\}= \sum _{i=1}^m\sup _{u_i}\left\{ u_i\beta _i-({u}_i-1)^+-(-{u}_i-1)^+\right\} . \end{aligned}$$

Write \(\theta _i=\left\{ u_i\beta _i-({u}_i-1)^+-(-{u}_i-1)^+\right\} \). We deal with three cases.

Case I: Suppose \(-1\le u_i\le 1\). Then,

$$\begin{aligned} \sup _{-1\le u_i\le 1}\theta _i&=\sup _{-1\le u_i\le 1}\left\{ u_i\beta _i\right\} =|\beta _i|. \end{aligned}$$

Case II: Suppose \(u_i>1\). Then,

$$\begin{aligned} \sup _{u_i>1}\theta _i&=\sup _{u_i>1}\left\{ u_i(\beta _i-1)+1\right\} ={\left\{ \begin{array}{ll} 0,&{}\hbox { if}\ \beta _i\le 1,\\ \infty ,&{}\hbox {otherwise}. \end{array}\right. } \end{aligned}$$

Case III: Suppose \(u_i<-1\). Then,

$$\begin{aligned} \sup _{u_i<-1}\theta _i&=\sup _{u_i<-1}\left\{ u_i(\beta _i+1)+1\right\} ={\left\{ \begin{array}{ll} 0,&{}\hbox { if}\ \beta _i\ge -1,\\ \infty ,&{}\hbox {otherwise}. \end{array}\right. } \end{aligned}$$

Taking the maximum over all cases yields

$$\begin{aligned} \sup _{u_i}\theta _i&={\left\{ \begin{array}{ll} |\beta _i|,&{}\hbox { if}\ -1\le \beta _i\le 1,\\ \infty ,&{}\hbox {otherwise}. \end{array}\right. }=|\beta _i|+\Psi _{[-1,1]}(\beta _i). \end{aligned}$$

Therefore,

$$\begin{aligned} \phi (\beta )&=\sum _{i=1}^m\left[ |\beta _i|+\Psi _{[-1,1]}(\beta _i)\right] . \end{aligned}$$

The function \(\phi \) satisfies Assumption 2. The dual function \(\tilde{\Psi }\) can then be written as

$$\begin{aligned} \tilde{\Psi }(y,{\beta })&=x_0^{\top }y +\frac{1}{2}y^{\top }y+\frac{1}{2}E\left[ \int _0^T[\beta ^{\top }D^{\dagger }(D^{\dagger })^{\top }\beta +2\sum _{i=1}^m(|\beta _i|+\Psi _{[-1,1]}(\beta _i))]dt\right] . \end{aligned}$$

Similar to the example in Sect. 4.1, \((\hat{y},\hat{\beta })=(-x_0,0)\) is the dual optimal control, and the primal optimal control is \(\hat{u}=0\) with the corresponding state process \(\hat{X}(t)=\Phi _1(t)x_0\).

4.3 Random Coefficients

Assume the same state process (28), but with the following specifications:

$$\begin{aligned} B(t)=\sin W(t)\begin{bmatrix} 1&{}1&{}1\\ 1&{}1&{}1 \end{bmatrix}\quad \hbox {and}\quad D(t)=\frac{1}{3}\begin{bmatrix} 2&{}-1&{}1\\ -1&{}2&{}1 \end{bmatrix}. \end{aligned}$$

This implies that B is random, D is deterministic, and

$$\begin{aligned} D^{\dagger }=\begin{bmatrix} 1&{}0\\ 0&{}1\\ 1&{}1 \end{bmatrix} \quad \hbox {and} \quad B(I_m-D^{\dagger }D)=\frac{1}{3}\sin W(t)\begin{bmatrix} 1&{}1&{}-1\\ 1&{}1&{}-1 \end{bmatrix}. \end{aligned}$$

Although \(\sin W(t)=0\) for all \(W(t)=k\pi \), \(k\in \mathbb {R}\), the set on which \(B(I_m-D^{\dagger }D)=0\) has measure zero. Hence, Assumption 5 is satisfied. We can then rewrite the SDE (28) as

$$\begin{aligned} dX(t)&=[-4\mathbb {J}\sin ^2W(t)X(t)+B(t)u(t)]dt+\left[ -2\mathbb {J}\sin W(t)X(t)+D(t)u(t)\right] dW(t),\nonumber \\ X(0)&=x_0\in \mathbb {R}^2, \end{aligned}$$

where \(\mathbb {J}\) is the \(2\times 2\) matrix of ones. Assume \(f(u)=0\) and \(K=[-1,1]^3\). The corresponding solution \((\hat{Y},\hat{p}_2,\hat{q}_2)\) to the dual optimal control \((\hat{y},\hat{\beta })=(-x_0,0)\) satisfies the FBSDE (36). The solution to the SDE in (36) is given by \(\hat{Y}(t)=\Phi _3(t)\hat{y}\), where \(\Phi _3\) is the \(2\times 2\) fundamental matrix satisfying

$$\begin{aligned}&d\Phi _3(t)=-4\mathbb {J}\sin ^2W(t)\Phi _3(t)dt-2\mathbb {J}\sin W(t)\Phi _3(t)dW(t)\nonumber \\&\Phi _3(0)=I_2. \end{aligned}$$
(39)

Due to the randomness in both coefficients, we cannot use the result in [17, page 101] where we immediately obtain an explicit solution to the above SDE. However, the condition of having constant and commuting coefficients is not a necessary condition. Write

$$\begin{aligned} \hat{\Phi }_3(t):=\exp (-2\mathbb {J}Z(t))=\sum _{k=0}^{\infty }\frac{1}{k!}(-2\mathbb {J})^kZ(t)^k, \end{aligned}$$

where

$$\begin{aligned}&dZ(t)=4\sin ^2W(t)dt+\sin W(t)dW(t),\nonumber \\&Z(0)=0. \end{aligned}$$

We can further simplify the infinite series by diagonalisation of matrix \(\mathbb {J}\). The eigenvalues of \(\mathbb {J}\) are 0 and 2 with respective eigenvectors \(v_1=(1,-1)^{\top }\) and \(v_2=(1,1)^{\top }\). We can decompose \(\mathbb {J}\) as \(\mathbb {J}=PDP^{-1}\), where

$$\begin{aligned} P=\begin{bmatrix} 1&{}1\\ -1&{}1 \end{bmatrix},\quad D=\begin{bmatrix} 0&{}0\\ 0&{}2 \end{bmatrix},\quad P^{-1}=\frac{1}{2}\begin{bmatrix} 1&{}-1\\ 1&{}1 \end{bmatrix}. \end{aligned}$$

Hence,

$$\begin{aligned} \hat{\Phi }_3(t)&=\sum _{k=0}^{\infty }\frac{1}{k!}(-2Z(t))^k PD^k P^{-1}=P\begin{bmatrix} 0&{}0\\ 0&{}e^{-4Z(t)} \end{bmatrix}P^{-1}=\frac{1}{2}e^{-4Z(t)}\mathbb {J}. \end{aligned}$$

We want to show that \(\hat{\Phi }_3\) is the solution of (39). Using Itô’s formula to \(\hat{\Phi }_3\) yields

$$\begin{aligned} d\hat{\Phi }_3(t)&=\frac{1}{2}\mathbb {J}[e^{-4Z(t)}(-4dZ(t))+\frac{1}{2}e^{-4Z(t)}(16\sin ^2W(t))dt]\\&=\hat{\Phi }_3(t)\left[ -4\sin ^2W(t)dt-2\sin W(t)dW(t)\right] , \end{aligned}$$

which proves that \(\hat{\Phi }_3\) is indeed the solution of (39). We obtain \(\hat{u}=0\) and \(\hat{X}(t)=\Phi _3(t)x_0\).

4.4 Nonsmooth Running Cost and Control Constraint

In all previous examples, we have the optimal control \(\hat{u}=0\). We now construct an example with nonzero optimal control \(\hat{u}\). Assume that the state process X satisfies (28) with \(A=-\frac{1}{2}BD^{\dagger }(D^{\dagger })^{\top }B^{\top }\), \(C=-(D^{\dagger })^{\top }B^{\top }\) and \(K=[-1,1]^m\). Choose a vector \(\kappa \in \mathbb {R}^m\) satisfying \(|(D^\dagger )^{\top } \kappa |>\Vert (D^\dagger )^{\top }\Vert |e|\), where \(e\in \mathbb {R}^m\) is a vector with all components equal to 1 and \(\Vert (D^\dagger )^{\top }\Vert \) is the matrix norm of \((D^\dagger )^{\top }\). (The existence of such a \(\kappa \) is guaranteed, for example, we may choose \(\kappa =\lambda e\) with \(\lambda \) a sufficiently large number.) The objective function is given by

$$\begin{aligned} J(u):=E\left[ \int _0^Tf(u(t))dt+\frac{1}{2}X(T)^{\top }X(T) + X_0(T)^{\top }X(T)\right] , \end{aligned}$$
(40)

where \(f(u)=\sum _{i=1}^m |u_i|\) and \(X_0\) is the solution of the linear SDE

$$\begin{aligned} dX_0=[AX_0 + C^{\top }(D^\dagger )^{\top } \kappa ]dt + [CX_0 - (D^\dagger )^{\top } \kappa ]dW \end{aligned}$$

with the initial condition \(X_0(0)=0\). The solution \(X_0\) is given by

$$\begin{aligned}{} & {} X_0(t)=\Phi _1(t)\int _0^t \Phi _1(s)^{-1}(C^{\top }+ C )(D^\dagger )^{\top } \kappa ) ds -\Phi _1(t) \int _0^t \Phi _1(s)^{-1} (D^\dagger )^{\top } \kappa dW(s),\\{} & {} \ 0\le t\le T, \end{aligned}$$

and \(\Phi _1(t)\) is the \(n\times n\) matrix solution of SDE (38) at time t, see [24, Theorem 1.6.14]. Since \(X_0(T)\) is a random variable, the terminal cost function \(g(x)={1\over 2}x^{\top } x +X_0(T)^{\top }x\) is not a deterministic function and the HJB approach is not applicable unless one considers a new state variable \(Y:= (X,X_0)\), the resulting HJB equation might not be solvable due to the dimension, even though \(X_0\) is not controlled. We may use the SMP to solve the problem. The adjoint equation is given by

$$\begin{aligned}&dp_1=-[A^{\top }p_1+C^{\top }q_{1}]dt+q_{1}dW,\nonumber \\&p_1(T)=-(X(T)+X_0(T)). \end{aligned}$$
(41)

The optimal control \(\hat{u}(t)\) is the maximum point of H(tX(t), up(t), q(t)) over \(u\in K\), where H is the Hamiltonian function, defined by (30). We need to solve a constrained optimization problem to find \(\hat{u}(t)\) that depends on X(t), p(t), q(t) but has no closed-form expression in the presence of nondifferentiable function f and constraint set K. SDE (28), BSDE (41), and the maximum condition (32) form a fully coupled nonlinear FBSDE, highly difficult to solve and inconceivable to ansatz the optimal control \(\hat{u}\).

We now try to solve the problem with the dual method. Simple calculus shows that the dual functions of f and g are given by

$$\begin{aligned} \phi (\beta )=\sum _{i=1}^m [ (\beta _i-1)^+ + (-\beta _i-1)^+] \end{aligned}$$

and

$$\begin{aligned} h(y) = {1\over 2} (y+X_0(T))^{\top }(y+X_0(T)). \end{aligned}$$

The dual-state process Y satisfies SDE (34), and the dual problem is given by

$$\begin{aligned} \hbox {Minimize }\tilde{\Psi }(y,{\beta })&=x_0^{\top }y+E\left[ \int _0^T\phi (\beta (t))dt+h(Y(T))\right] . \end{aligned}$$

Define \(\bar{Y}(t)=Y(t)+X_0(t)\) for \(t\in [0,T]\). Then, \(\bar{Y}\) satisfies SDE

$$\begin{aligned} d\bar{Y}{} & {} =[ -\frac{1}{2}BD^{\dagger }(D^{\dagger })^{\top }B^{\top }\bar{Y} + BD^{\dagger }(D^{\dagger })^{\top }(\beta -\kappa )]dt + [-(D^{\dagger })^{\top }B^{\top } \bar{Y}\\{} & {} \quad + (D^{\dagger })^{\top }(\beta -\kappa )] dW. \end{aligned}$$

Using Itô’s formula to \(\bar{Y}^{\top }\bar{Y}\) yields

$$\begin{aligned} d[\bar{Y}^{\top }\bar{Y}]= (\beta -\kappa )^{\top }D^{\dagger }(D^{\dagger })^{\top }(\beta -\kappa ) dt + \hbox {martingale}. \end{aligned}$$

Noting that \(h(Y(T))={1\over 2}\bar{Y}(T)^{\top }\bar{Y}(T)\), we can write the dual objective function equivalently as

$$\begin{aligned} \tilde{\Psi }(y,{\beta }) = x_0^{\top }y+{1\over 2}y^{\top }y + E\left[ \int _0^T(\phi (\beta )+ {1\over 2} (\beta -\kappa )^{\top }D^{\dagger }(D^{\dagger })^{\top }(\beta -\kappa )) dt \right] . \end{aligned}$$

The dual optimal solution is given by \(\hat{y}=-x_0\) and \(\hat{\beta }(t)\) that is the minimum point of the convex function \( \phi (\beta )+ {1\over 2} (\beta -\kappa )^{\top }D^{\dagger }(D^{\dagger })^{\top }(\beta -\kappa )\) over \(\beta \in \mathbb {R}^m\) for \(t\in [0,T]\). A necessary and sufficient optimality condition for \(\hat{\beta }(t)\) is

$$\begin{aligned} 0 \in \partial \phi (\hat{\beta })+ D^{\dagger }(D^{\dagger })^{\top }(\hat{\beta }-\kappa ), \end{aligned}$$

where \(\partial \phi (\hat{\beta })\) is the subdifferential of \(\phi \) at \(\hat{\beta }\), given by

$$\begin{aligned} \partial \phi (\hat{\beta }) = \prod _{i=1}^m \partial [ (\hat{\beta }_i-1)^+ + (-\hat{\beta }_i-1)^+] \end{aligned}$$

and

$$\begin{aligned} \partial [ (\hat{\beta }_i-1)^+ + (-\hat{\beta }_i-1)^+] = {\left\{ \begin{array}{ll} \{-1\}, \quad \hat{\beta }_i<-1\\ {[}-1,0], \quad \hat{\beta }_i=-1\\ \{0\},\quad \hat{\beta }_i\in (-1,1)\\ {[}0,1],\quad \hat{\beta }_i=1\\ \{1\}, \quad \hat{\beta }_i>1. \end{array}\right. } \end{aligned}$$

We now show \(D^{\dagger }(D^{\dagger })^{\top }(\hat{\beta }-\kappa )\ne 0\). Assume the contrary, that is, \(D^{\dagger }(D^{\dagger })^{\top }(\hat{\beta }-\kappa )=0\), then \(0 \in \partial \phi (\hat{\beta })\), which implies \(|\hat{\beta }_i|\le 1\) for \(i=1,\ldots ,m\). On the other hand, from \(D D^{\dagger } =I_n\), we have \((D^{\dagger })^{\top }(\hat{\beta }-\kappa )=0\), that is, \((D^{\dagger })^{\top }\hat{\beta }=(D^{\dagger })^{\top }\kappa \), which implies \(|(D^{\dagger })^{\top }\hat{\beta }|=|(D^{\dagger })^{\top }\kappa |\), however, \(|(D^{\dagger })^{\top }\hat{\beta }|\le \Vert (D^{\dagger })^{\top }\Vert |\hat{\beta }|\le \Vert (D^{\dagger })^{\top }\Vert |e|\) and \(|(D^{\dagger })^{\top }\kappa | > \Vert (D^{\dagger })^{\top }\Vert |e|\) by the choice of \(\kappa \). This is a contradiction, therefore, we must have \(D^{\dagger }(D^{\dagger })^{\top }(\hat{\beta }-\kappa )\ne 0\).

From Theorem 2, the solution \((\hat{Y},\hat{p}_2,\hat{q}_2)\) of the FBSDE

$$\begin{aligned} d\hat{Y}= & {} [-\frac{1}{2}BD^{\dagger }(D^{\dagger })^{\top }B^{\top }\hat{Y} + BD^{\dagger }(D^{\dagger })^{\top }\hat{\beta }]dt +[-(D^{\dagger })^{\top }B^{\top }\hat{Y} + (D^{\dagger })^{\top } \hat{\beta }]dW,\nonumber \\ \hat{Y}(0)= & {} \hat{y},\nonumber \\ d\hat{p}_2= & {} [\frac{1}{2}BD^{\dagger }(D^{\dagger })^{\top }B^{\top }\hat{p}_2+BD^{\dagger }\hat{q}_{2}]dt+\hat{q}_{2}dW,\nonumber \\ \hat{p}_2(T)= & {} -(\hat{Y}(T)+X_0(T)) \end{aligned}$$
(42)

satisfies the conditions

$$\begin{aligned}&\hat{p}_2(0)=x_0,\nonumber \\&D^{\dagger }\hat{q}_{2}+D^{\dagger }(D^{\dagger })^{\top }B^{\top }\hat{p}_2\in \partial \phi (\hat{\beta }),\nonumber \\&D^{\dagger }\hat{q}_{2}+D^{\dagger }(D^{\dagger })^{\top }B^{\top }\hat{p}_2\in K. \end{aligned}$$

Similar to the derivation of solution to FBSDE (36), we have

$$\begin{aligned} \hat{p}_2(t) = -\Phi _2(t)E[\Phi _1^{\top }(T)(\hat{Y}(T)+X_0(T))|\mathcal {F}_t]. \end{aligned}$$

Using Itô’s formula, we can check that \(\Phi _1^{\top }(t)(\hat{Y}(t)+X_0(t))\) is a martingale and get

$$\begin{aligned} \hat{p}_2(t) = -\Phi _2(t)\Phi _1^{\top }(t)(\hat{Y}(t)+X_0(t))=-(\hat{Y}(t)+X_0(t)), \ t\in [0,T]. \end{aligned}$$

Here we have used \(\Phi _2(t)=[\Phi _1^{\top }(t)]^{-1}\). Therefore,

$$\begin{aligned} d\hat{p}_2&= -d\hat{ Y} - dX_0\\&= [-\frac{1}{2}BD^{\dagger }(D^{\dagger })^{\top }B^{\top }\hat{p}_2 - BD^{\dagger }(D^{\dagger })^{\top }(\hat{\beta }-\kappa )]dt\\&\quad +[-(D^{\dagger })^{\top }B^{\top }\hat{p}_2 - (D^{\dagger })^{\top } (\hat{\beta }-\kappa )]dW. \end{aligned}$$

Comparing the diffusion coefficient of the above equation with that of BSDE in (42), we must have

$$\begin{aligned} \hat{q}_2= -(D^{\dagger })^{\top }B^{\top }\hat{p}_2 - (D^{\dagger })^{\top } (\hat{\beta }-\kappa ). \end{aligned}$$

From Theorem 3, the optimal control for the primal problem is given by

$$\begin{aligned} \hat{u}(t)=D^{\dagger }\hat{q}_{2}(t)+D^{\dagger }(D^{\dagger })^{\top }B^{\top }\hat{p}_2(t) = - D^{\dagger }(D^{\dagger })^{\top } (\hat{\beta }-\kappa ), \end{aligned}$$

which is nonzero for all \(t\in [0,T]\).

Remark 8

Since \(\hat{u}(t) \in \partial \phi (\hat{\beta }(t))\), components of optimal control \(\hat{u}(t)\) take values in the set \(\{-1,0,1\}\), depending on the dual optimal control \(\hat{\beta }(t)\). There is no closed-form solution \(\hat{\beta }\) for the dual problem; however, it is much easier to solve the dual problem than to solve the primal problem. The reason is that finding the dual optimal control \(\hat{\beta }\) is independent of the dual-state and adjoint processes \(\hat{Y}, \hat{p}_2, \hat{q}_2\), a standard finite-dimensional convex optimization problem, which is in sharp contrast to finding the primal optimal control \(\hat{u}\) directly from the primal problem as \(\hat{u}\) depends on the primal-state and adjoint processes \(\hat{X}, \hat{p}_1, \hat{q}_1\) and one has to solve a fully coupled nonlinear FBSDE, a highly difficult infinite-dimensional problem, not to mention there is no closed-form solution \(\hat{u}(t)\) in terms of \(\hat{X}, \hat{p}_1, \hat{q}_1\). This example illustrates the usefulness of the dual formulation in solving the primal problem. We thank the anonymous reviewer whose suggestion of finding a nonzero optimal control motivated us in constructing this nontrivial example.

5 Conclusions

In this paper, we have discussed a general multidimensional linear convex stochastic control problem with nondifferentiable objective function, control constraints, and random coefficients. We have formulated an equivalent dual problem, proved the dual stochastic maximum principle and the relation of the optimal control, optimal state, and adjoint processes between primal and dual problems, and illustrated the usefulness of the dual approach with some examples. There remain many open questions, for example, the duality theory for Markov modulated LC problems with control and terminal state constraints, pathwise state constraints as in [5], and other more general frameworks. We leave these and other questions for future research.