An Optimal Control Derivation of Nonlinear Smoothing Equations

Kim, Jin Won; Mehta, Prashant G.

doi:10.1007/978-3-030-51264-4_12

Jin Won Kim⁷ &
Prashant G. Mehta⁷

Part of the book series: Studies in Systems, Decision and Control ((SSDC,volume 304))

Included in the following conference series:

Advances in Dynamics, Optimization and Computation: A volume dedicated to Michael Dellnitz on the occasion of his 60th birthday

548 Accesses
4 Citations
1 Altmetric

Abstract

The purpose of this paper is to review and highlight some connections between the problem of nonlinear smoothing and optimal control of the Liouville equation. The latter has been an active area of recent research interest owing to work in mean-field games and optimal transportation theory. The nonlinear smoothing problem is considered here for continuous-time Markov processes. The observation process is modeled as a nonlinear function of a hidden state with an additive Gaussian measurement noise. A variational formulation is described based upon the relative entropy formula introduced by Newton and Mitter. The resulting optimal control problem is formulated on the space of probability distributions. The Hamilton’s equation of the optimal control are related to the Zakai equation of nonlinear smoothing via the log transformation. The overall procedure is shown to generalize the classical Mortensen’s minimum energy estimator for the linear Gaussian problem.

To Michael Dellnitz on the occasion of his 60th birthday.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bensoussan, A.: Estimation and Control of Dynamical Systems, vol. 48. Springer, Heidelberg (2018)
Book Google Scholar
Bensoussan, A., Frehse, J., Yam, P., et al.: Mean Field Games and Mean Field Type Control Theory, vol. 101. Springer, Heidelberg (2013)
Book Google Scholar
Brockett, R.W.: Optimal control of the liouville equation. AMS IP Stud. Adv. Math. 39, 23 (2007)
Article MathSciNet Google Scholar
Carmona, R., Delarue, F., et al.: Probabilistic Theory of Mean Field Games with Applications I-II. Springer, Heidelberg (2018)
Book Google Scholar
Chen, Y., Georgiou, T.T., Pavon, M.: On the relation between optimal transport and Schrödinger bridges: a stochastic control viewpoint. J. Optim. Theory Appl. 169(2), 671–691 (2016)
Article MathSciNet Google Scholar
Chetrite, R., Touchette, H.: Variational and optimal control representations of conditioned and driven processes. J. Stat. Mech.: Theory Exp. 2015(12), P12001 (2015)
Google Scholar
Fleming, W., Mitter, S.: Optimal control and nonlinear filtering for nondegenerate diffusion processes. Stochastics 8, 63–77 (1982)
Article MathSciNet Google Scholar
Kailath, T., Sayed, A.H., Hassibi, B.: Linear Estimation. Prentice Hall, Upper Saddle River (2000)
MATH Google Scholar
Kappen, H.J., Ruiz, H.C.: Adaptive importance sampling for control and inference. J. Stat. Phys. 162(5), 1244–1266 (2016)
Article MathSciNet Google Scholar
Mitter, S.K., Newton, N.J.: A variational approach to nonlinear estimation. SIAM J. Control Optim. 42(5), 1813–1833 (2003)
Article MathSciNet Google Scholar
Mortensen, R.E.: Maximum-likelihood recursive nonlinear filtering. J. Optim. Theory Appl. 2(6), 386–394 (1968)
Article MathSciNet Google Scholar
Pardoux, E.: Non-linear filtering, prediction and smoothing. In: Stochastic Systems: the Mathematics of Filtering and Identification and Applications, pp. 529–557. Springer (1981)
Google Scholar
Reich, S.: Data assimilation: the Schrödinger perspective. Acta Numerica 28, 635–711 (2019)
Article MathSciNet Google Scholar
Rogers, L.C.G., Williams, D.: Diffusions, Markov Processes and Martingales: Volume 2, Itô Calculus, vol. 2. Cambridge University Press, Cambridge (2000)
Book Google Scholar
Ruiz, H., Kappen, H.J.: Particle smoothing for hidden diffusion processes: adaptive path integral smoother. IEEE Trans. Signal Process. 65(12), 3191–3203 (2017)
Article MathSciNet Google Scholar
Sutter, T., Ganguly, A., Koeppl, H.: A variational approach to path estimation and parameter inference of hidden diffusion processes. J. Mach. Learn. Res. 17, 6544–80 (2016)
MathSciNet MATH Google Scholar
Van Handel, R.: Filtering, stability, and robustness. Ph.D. thesis, California Institute of Technology (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Coordinated Science Laboratory and Department of Mechanical Science and Engineering, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
Jin Won Kim & Prashant G. Mehta

Authors

Jin Won Kim
View author publications
You can also search for this author in PubMed Google Scholar
Prashant G. Mehta
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Prashant G. Mehta .

Editor information

Editors and Affiliations

Department of Mathematics, Technical University of Munich, Munich, Germany
Oliver Junge
Computer Science Department, Cinvestav-IPN, Mexico City, Distrito Federal, Mexico
Oliver Schütze
School of Mathematics and Statistics, University of New South Wales, Sydney, NSW, Australia
Gary Froyland
Department of Mathematics, Paderborn University, Paderborn, Germany
Sina Ober-Blöbaum
Institute of Mathematics and its Didactics, Leuphana University Lüneburg, Lüneburg, Germany
Kathrin Padberg-Gehle

A Appendix

1.1 A.1 Derivation of Lagrangian: Euclidean Case

By Girsanov’s theorem, the Radon-Nikodym derivative is obtained (see [13, Eqn. 35]) as follows:

$$ \frac{\,\mathrm {d}\tilde{P}}{\,\mathrm {d}P}(\tilde{X}) = \frac{\,\mathrm {d}\pi _0}{\,\mathrm {d}\nu _0}(\tilde{X}_0) \; \exp \Big (\int _0^T {\frac{1}{2}}|u_t(\tilde{X}_t)|^2 \,\mathrm {d}t + u_t(\tilde{X}_t) \,\mathrm {d}\tilde{B}_t \Big ). $$

Thus, we obtain the relative entropy formula:

$$\begin{aligned} \mathsf{D}(\tilde{P}\Vert P)&= \mathsf{E}\Big (\log \dfrac{\,\mathrm {d}\pi _0}{\,\mathrm {d}\nu _0}(\tilde{X}_0) + \int _0^T {\frac{1}{2}}|u_t(\tilde{X}_t)|^2 \,\mathrm {d}t + u_t(\tilde{X}_t) \,\mathrm {d}\tilde{B}_t \Big )\\&=\mathsf{D}(\pi _0\Vert \nu _0) + \int _0^T {\frac{1}{2}}\langle \pi _t,|u_t|^2\rangle \,\mathrm {d}t. \end{aligned}$$

1.2 A.2 Derivation of Lagrangian: Finite State-Space Case

The derivation of the Lagrangian is entirely analogous to the Euclidean case except the R-N derivative is given according to [17, Prop. 2.1.1]:

Upon taking log and expectation of both sides, we arrive at the relative entropy formula:

1.3 A.3 Proof of Proposition 1

The standard approach is to incorporate the constraint into the objective function by introducing the Lagrange multiplier $\lambda = \{\lambda _t:0\le t\le T\}$ as follows:

$$\begin{aligned} \tilde{J}&(u,\lambda \,;\pi _0,z)\\&= \mathsf{D}(\pi _0 \Vert \nu _0) + \int _0^T {\frac{1}{2}}\langle \pi _t, |u_t|^2 + h^2\rangle + z_t\langle \pi _t, \tilde{\mathcal{A}}(u_t)h\rangle \,\mathrm {d}t\\&\quad +\int _0^T \langle \lambda _t, \frac{\partial \pi _t}{\partial t} - \tilde{\mathcal{A}}^\dagger (u_t) \pi _t \rangle \,\mathrm {d}t - z_T\langle \pi _T,h\rangle . \end{aligned}$$

Upon using integration by parts and the definition of the adjoint operator, after some manipulation involving completion of squares, we arrive at

$$\begin{aligned} \tilde{\mathsf{J}}(u,&\lambda \,;\pi _0,z)=\mathsf{D}(\pi _0 \Vert \nu _0) + \int _0^T {\frac{1}{2}}\langle \pi _t, |u_t - \sigma ^\top \nabla (\lambda _t - z_th)|^2\rangle \,\mathrm {d}t\\&-\int _0^T \langle \pi _t, \frac{\partial }{\partial t}\lambda _t + \mathcal{A}(\lambda _t - z_th)-{\frac{1}{2}}h^2+{\frac{1}{2}}|\sigma ^\top \nabla (\lambda _t - z_th)|^2\rangle \,\mathrm {d}t\\&+ \langle \pi _T,\lambda _T-z_Th\rangle - \langle \pi _0,\lambda _0\rangle . \end{aligned}$$

Therefore, it is natural to pick $\lambda $ to satisfy the following partial differential equation:

$$\begin{aligned} -\frac{\partial \lambda _t}{\partial t}(x)&= \big (\mathcal{A} (\lambda _t(\cdot ) - z_th(\cdot ))\big ) - {\frac{1}{2}}h^2(x)+{\frac{1}{2}}\big |\sigma ^\top \nabla (\lambda _t - z_th)(x)\big |^2 \\&= e^{-(\lambda _t(x) - z_th(x))}(\mathcal{A}e^{\lambda _t(\cdot ) - z_th(\cdot )})(x) - {\frac{1}{2}}h^2(x) \nonumber \end{aligned}$$

(14)

with the boundary condition $\lambda _T(x) = z_Th(x)$. With this choice, the objective function becomes

$$\begin{aligned} \tilde{\mathsf{J}}(u\,;\lambda ,\pi _0,z)&= \mathsf{D}(\pi _0 \Vert \nu _0) - \langle \pi _0, \lambda _0\rangle \\&+ \int _0^T {\frac{1}{2}}\pi _t\big ( \big |u_t - \sigma ^\top \nabla (\lambda _t-z_th)\big |^2\big ) \,\mathrm {d}t \end{aligned}$$

which suggest the optimal choice of control is:

$$\begin{aligned} u_t(x) = \sigma ^\top (x) \nabla (\lambda _t-z_th)(x). \end{aligned}$$

With this choice, the objective function becomes

$$\begin{aligned} \mathsf{D}(\pi _0\Vert \nu _0) - \langle \pi _0, \lambda _0\rangle&= \int _{\mathbb {S}}\pi _0(x) \log \frac{\pi _0(x)}{\nu _0(x)}- \lambda _0(x)\pi _0(x)\,\mathrm {d}x\\&=\int _{\mathbb {S}} \pi _0(x)\log \frac{\pi _0(x)}{\nu _0\exp (\lambda _0(x))}\,\mathrm {d}x \end{aligned}$$

which is minimized by choosing

$$ \pi _0(x) = \frac{1}{C} \nu _0(x)\exp (\lambda _0(x)) $$

where C is the normalization constant.

1.4 A.4 Proof of Proposition 2

The proof for the finite state-space case is entirely analogous to the proof for the Euclidean case. The Lagrange multiplier $\lambda = \{\lambda _t\in \mathbb {R}^d: 0\le t\le T\}$ is introduced to transform the optimization problem into an unconstrained problem:

$$\begin{aligned} \tilde{\mathsf{J}}(u,\lambda \,;\pi _0,z)&= \mathsf{D}(\pi _0\Vert \nu _0)+ \int _0^T\pi _t^\top \big (C(u_t)+{\frac{1}{2}}h^2 + z_t \tilde{A}(u_t)h \big )\,\mathrm {d}t \\&+ \int _0^T\lambda _t^\top \big (\frac{\,\mathrm {d}\pi _t}{\,\mathrm {d}t} - \tilde{A}^\top (u_t) \pi _t\big )\,\mathrm {d}t - z_T h^\top \pi _T. \end{aligned}$$

Upon using integral by parts,

$$\begin{aligned} \tilde{\mathsf{J}}(u,\lambda \,;\pi _0,z)&= \mathsf{D}(\pi _0\Vert \nu _0) + \int _0^T \pi _t^\top \big (C(u_t) -\tilde{A}(u_t)(\lambda _t - z_th)\big ) \,\mathrm {d}t\\&+\int _0^T\pi _t^\top (-\dot{\lambda }_t+ {\frac{1}{2}}h^2)\,\mathrm {d}t +\pi _T^\top (\lambda _T-z_Th) - \pi _0^\top \lambda _0. \end{aligned}$$

The first integrand is

The minimizer is obtained, element by element, as

$$ [u_t]_{ij} = e^{([\lambda _t - z_th]_j-[\lambda _t - z_th]_i)} $$

and the corresponding minimum value is obtained by:

$$ [C(u_t^*) -\tilde{A}_t(\lambda _t - Z_th)]_i = -[Ae^{\lambda _t-z_th}]_i [e^{-(\lambda _t-z_th)}]_i. $$

Therefore with the minimum choice of $u_t$ above,

$$\begin{aligned} \tilde{\mathsf{J}}(u,\lambda \,;\pi _0,z)&= \mathsf{D}(\pi _0\Vert \nu _0) + \int _0^T \pi _t^\top \big (-(Ae^{\lambda _t-z_th}) \cdot e^{-(\lambda _t-z_th)}\big ) \,\mathrm {d}t\\&+\int _0^T\pi _t^\top (-\dot{\lambda }_t+ {\frac{1}{2}}h^2)\,\mathrm {d}t +\pi _T^\top (\lambda _T-z_Th) - \pi _0^\top \lambda _0. \end{aligned}$$

Upon choosing $\lambda $ according to:

$$ -[\dot{\lambda }_t]_i = [Ae^{\lambda _t-z_th}]_i [e^{-(\lambda _t-z_th)}]_i - {\frac{1}{2}}h_i^2,\quad \lambda _T = z_Th. $$

The objective function simplifies to

$$ \mathsf{D}(\pi _0\Vert \nu _0) - \pi _0^\top \lambda _0 = \sum _{i=1}^d[\pi _0]_i \log \frac{[\pi _0]_i}{[\nu _0]_ie^{[\lambda _0]_i}} $$

where the minimum value is obtained by choosing

$$ [\pi _0]_i =\frac{1}{C} [\nu _0]_ie^{[\lambda _0]_i} $$

where C is the normalization constant.

1.5 A.5 Proof of Proposition 3

Euclidean Case. Equation (9b) is identical to the backward path-wise Eq. (4). So, we need to only derive the equation for $\mu _t$. Using the regular form of the product formula,

$$\begin{aligned} \frac{\partial \mu _t}{\partial t}&= \frac{1}{\pi _t}\frac{\partial \pi _t}{\partial t} -\frac{\partial \lambda _t}{\partial t}\\&=\frac{1}{\pi _t}(\tilde{\mathcal{A}}^\dagger (u_t)\pi _t) + e^{-(\lambda _t - z_th)}(\mathcal{A}e^{\lambda _t(\cdot ) - z_th(\cdot )}) - {\frac{1}{2}}h^2. \end{aligned}$$

With optimal control $u_t = \sigma ^\top \nabla (\lambda _t-z_th)$,

$$\begin{aligned} (\tilde{\mathcal{A}}^\dagger (u_t)\pi _t)&=(\mathcal{A}^\dagger \pi _t)-{\text {div}}\big (\sigma \sigma ^\top \nabla \pi _t\big )\\&\quad +\pi _t {\text {div}}\big (\sigma \sigma ^\top \nabla (\mu _t+z_th)\big )\\&\quad +(\nabla \pi _t)^\top (\sigma \sigma ^\top \nabla (\mu _t+z_th)) \end{aligned}$$

and

$$\begin{aligned}&e^{-(\lambda _t - z_th)}(\mathcal{A}e^{\lambda _t(\cdot ) - z_th(\cdot )})\\&=\frac{1}{\pi _t}(\mathcal{A}\pi _t) - {\frac{1}{2}}|\sigma ^\top \nabla \log \pi _t|^2 - (\mathcal{A}(\mu _t + z_th)) \\&\quad + {\frac{1}{2}}\big |\sigma ^\top \nabla \log (\pi _t) - \sigma ^\top \nabla (\mu _t +z_th)\big |^2. \end{aligned}$$

Therefore,

$$\begin{aligned} \frac{\partial \mu _t}{\partial t}&=\frac{1}{\pi _t}\big ((\mathcal{A}^\dagger \pi _t) + (\mathcal{A}\pi _t)-{\text {div}}(\sigma \sigma ^\top \nabla \pi _t)\big )\\&\quad -(\mathcal{A}(\mu _t + z_th)) + {\text {div}}\big (\sigma \sigma ^\top \nabla (\mu _t+z_th)\big )\\&\quad + {\frac{1}{2}}\big |\sigma ^\top \nabla (\mu _t+z_th)\big |^2 - {\frac{1}{2}}h^2\\&=e^{-(\mu _t(x)+z_th(x))}\big (\mathcal{A}^\dagger e^{(\mu _t(\cdot ) +z_th(\cdot ) )}\big )(x) - {\frac{1}{2}}h^2(x) \end{aligned}$$

with the boundary condition $\mu _0 = \log \nu _0$.

Finite State-Space Case. Equation (11b) is identical to the backward path-wise Eq. (6). To derive the equation for $\mu _t$, use the product formula

$$\begin{aligned} \Big [\frac{\,\mathrm {d}\mu _t}{\,\mathrm {d}t}\Big ]_i&= \frac{1}{[\pi _t]_i}\Big [\frac{\,\mathrm {d}\pi _t}{\,\mathrm {d}t}\Big ]_i - \Big [\frac{\,\mathrm {d}\lambda _t}{\,\mathrm {d}t}\Big ]_i\\&=\frac{1}{[\pi _t]_i}\big [\tilde{A}^\top (u_t)\pi _t\big ]_i + [e^{-(\lambda _t-z_th)}]_i[Ae^{\lambda _t+z_th}]_i - {\frac{1}{2}}[h^2]_i. \end{aligned}$$

The first term is:

$$\begin{aligned} \big [\tilde{A}^\top (u_t)\pi _t\big ]_i&= \sum _{j=1}^d \Big ([A]_{ji}[u_t]_{ji}[\pi _t]_j-[A]_{ij}[u_t]_{ij}[\pi _t]_i\Big ) \end{aligned}$$

and the second term is:

$$\begin{aligned}{}[e^{-(\lambda _t-z_th)}]_i&[Ae^{\lambda _t+z_th}]_i \\&= \frac{1}{[\pi _t]_i}[e^{\mu _t+z_th}]_i\sum _{j=1}^d [A]_{ij} [\pi _t]_j [e^{-(\mu _t +z_th)}]_j. \end{aligned}$$

The formula for the optimal control gives

$$\begin{aligned}{}[u_t]_{ij}&=\frac{[\pi _t]_j}{[\pi _t]_i}[e^{-(\mu _t + z_th)}]_j[e^{\mu _t + z_th}]_i. \end{aligned}$$

Combining these expressions,

$$\begin{aligned} \Big [\frac{\,\mathrm {d}\mu _t}{\,\mathrm {d}t}\Big ]_i&= \sum _{j=1}^d[A]_{ji}[e^{-(\mu _t + z_th)}]_i[e^{\mu _t + z_th}]_j - {\frac{1}{2}}[h^2]_i\\&=[e^{-(\mu _t + z_th)}]_i [A^\top e^{\mu _t + z_th}]_i - {\frac{1}{2}}[h^2]_i \end{aligned}$$

which is precisely the path-wise form of the Eq. (5). At time $t=0$, $\mu _0 = \log (C[\pi _0]_i) - [\lambda _0]_i = \log [\nu _0]_i $.

Smoothing Distribution. Since $(\lambda _t, \mu _t)$ is the solution to the path-wise form of the Zakai equations, the optimal trajectory

$$ \pi _t = \frac{1}{C}e^{\mu _t+\lambda _t} $$

represents the smoothing distribution.

1.6 A.6 Proof of Proposition 4

The dynamic programming equation for the optimal control problem is given by (see [1, Ch. 11.2]):

$$\begin{aligned} \min _{u\in \mathbb {R}^p} \Big \{\frac{\partial V_t}{\partial t}(x) + (\tilde{\mathcal{A}}(u) V_t)(x) + l(x,u\,;z_t)\Big \} = 0. \end{aligned}$$

(15)

Therefore,

$$\begin{aligned} -\frac{\partial V_t}{\partial t}(x)&= (\mathcal{A}V_t)(x) + h^2(x) + z_t(\mathcal{A}h)(x) \\&+ \min _{u}\Big \{{\frac{1}{2}}|u|^2 + u^\top \big (\sigma ^\top \nabla V_t(x) + z_t\sigma ^\top \nabla h(x)\big )\Big \}. \end{aligned}$$

Upon using the completion-of-square trick, the minimum is attained by a feedback form:

$$ u^* = -\sigma ^\top \nabla (V_t +z_th)(x). $$

The resulting HJB equation is given by

$$\begin{aligned} -\frac{\partial V_t}{\partial t}(x)&= \big (\mathcal{A}(V_t + z_th)\big )(x) + h^2(x) -{\frac{1}{2}}|\sigma ^\top \nabla (V_t+z_th)|^2 \end{aligned}$$

with boundary condition $V_T(x) = - z_Th(x)$. Compare the HJB equation with the Eq. (14) for $\lambda $, and it follows

$$ V_t(x) = -\lambda _t(x). $$

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kim, J.W., Mehta, P.G. (2020). An Optimal Control Derivation of Nonlinear Smoothing Equations. In: Junge, O., Schütze, O., Froyland, G., Ober-Blöbaum, S., Padberg-Gehle, K. (eds) Advances in Dynamics, Optimization and Computation. SON 2020. Studies in Systems, Decision and Control, vol 304. Springer, Cham. https://doi.org/10.1007/978-3-030-51264-4_12

Download citation

DOI: https://doi.org/10.1007/978-3-030-51264-4_12
Published: 21 July 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-51263-7
Online ISBN: 978-3-030-51264-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

An Optimal Control Derivation of Nonlinear Smoothing Equations

Abstract

Access this chapter

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Appendix

A Appendix

1.1 A.1 Derivation of Lagrangian: Euclidean Case

1.2 A.2 Derivation of Lagrangian: Finite State-Space Case

1.3 A.3 Proof of Proposition 1

1.4 A.4 Proof of Proposition 2

1.5 A.5 Proof of Proposition 3

1.6 A.6 Proof of Proposition 4

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation