Skip to main content

An Optimal Control Derivation of Nonlinear Smoothing Equations

  • Conference paper
  • First Online:
Advances in Dynamics, Optimization and Computation (SON 2020)

Abstract

The purpose of this paper is to review and highlight some connections between the problem of nonlinear smoothing and optimal control of the Liouville equation. The latter has been an active area of recent research interest owing to work in mean-field games and optimal transportation theory. The nonlinear smoothing problem is considered here for continuous-time Markov processes. The observation process is modeled as a nonlinear function of a hidden state with an additive Gaussian measurement noise. A variational formulation is described based upon the relative entropy formula introduced by Newton and Mitter. The resulting optimal control problem is formulated on the space of probability distributions. The Hamilton’s equation of the optimal control are related to the Zakai equation of nonlinear smoothing via the log transformation. The overall procedure is shown to generalize the classical Mortensen’s minimum energy estimator for the linear Gaussian problem.

To Michael Dellnitz on the occasion of his 60th birthday.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Bensoussan, A.: Estimation and Control of Dynamical Systems, vol. 48. Springer, Heidelberg (2018)

    Book  Google Scholar 

  2. Bensoussan, A., Frehse, J., Yam, P., et al.: Mean Field Games and Mean Field Type Control Theory, vol. 101. Springer, Heidelberg (2013)

    Book  Google Scholar 

  3. Brockett, R.W.: Optimal control of the liouville equation. AMS IP Stud. Adv. Math. 39, 23 (2007)

    Article  MathSciNet  Google Scholar 

  4. Carmona, R., Delarue, F., et al.: Probabilistic Theory of Mean Field Games with Applications I-II. Springer, Heidelberg (2018)

    Book  Google Scholar 

  5. Chen, Y., Georgiou, T.T., Pavon, M.: On the relation between optimal transport and Schrödinger bridges: a stochastic control viewpoint. J. Optim. Theory Appl. 169(2), 671–691 (2016)

    Article  MathSciNet  Google Scholar 

  6. Chetrite, R., Touchette, H.: Variational and optimal control representations of conditioned and driven processes. J. Stat. Mech.: Theory Exp. 2015(12), P12001 (2015)

    Google Scholar 

  7. Fleming, W., Mitter, S.: Optimal control and nonlinear filtering for nondegenerate diffusion processes. Stochastics 8, 63–77 (1982)

    Article  MathSciNet  Google Scholar 

  8. Kailath, T., Sayed, A.H., Hassibi, B.: Linear Estimation. Prentice Hall, Upper Saddle River (2000)

    MATH  Google Scholar 

  9. Kappen, H.J., Ruiz, H.C.: Adaptive importance sampling for control and inference. J. Stat. Phys. 162(5), 1244–1266 (2016)

    Article  MathSciNet  Google Scholar 

  10. Mitter, S.K., Newton, N.J.: A variational approach to nonlinear estimation. SIAM J. Control Optim. 42(5), 1813–1833 (2003)

    Article  MathSciNet  Google Scholar 

  11. Mortensen, R.E.: Maximum-likelihood recursive nonlinear filtering. J. Optim. Theory Appl. 2(6), 386–394 (1968)

    Article  MathSciNet  Google Scholar 

  12. Pardoux, E.: Non-linear filtering, prediction and smoothing. In: Stochastic Systems: the Mathematics of Filtering and Identification and Applications, pp. 529–557. Springer (1981)

    Google Scholar 

  13. Reich, S.: Data assimilation: the Schrödinger perspective. Acta Numerica 28, 635–711 (2019)

    Article  MathSciNet  Google Scholar 

  14. Rogers, L.C.G., Williams, D.: Diffusions, Markov Processes and Martingales: Volume 2, ItĂ´ Calculus, vol. 2. Cambridge University Press, Cambridge (2000)

    Book  Google Scholar 

  15. Ruiz, H., Kappen, H.J.: Particle smoothing for hidden diffusion processes: adaptive path integral smoother. IEEE Trans. Signal Process. 65(12), 3191–3203 (2017)

    Article  MathSciNet  Google Scholar 

  16. Sutter, T., Ganguly, A., Koeppl, H.: A variational approach to path estimation and parameter inference of hidden diffusion processes. J. Mach. Learn. Res. 17, 6544–80 (2016)

    MathSciNet  MATH  Google Scholar 

  17. Van Handel, R.: Filtering, stability, and robustness. Ph.D. thesis, California Institute of Technology (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Prashant G. Mehta .

Editor information

Editors and Affiliations

A Appendix

A Appendix

1.1 A.1 Derivation of Lagrangian: Euclidean Case

By Girsanov’s theorem, the Radon-Nikodym derivative is obtained (see  [13, Eqn. 35]) as follows:

$$ \frac{\,\mathrm {d}\tilde{P}}{\,\mathrm {d}P}(\tilde{X}) = \frac{\,\mathrm {d}\pi _0}{\,\mathrm {d}\nu _0}(\tilde{X}_0) \; \exp \Big (\int _0^T {\frac{1}{2}}|u_t(\tilde{X}_t)|^2 \,\mathrm {d}t + u_t(\tilde{X}_t) \,\mathrm {d}\tilde{B}_t \Big ). $$

Thus, we obtain the relative entropy formula:

$$\begin{aligned} \mathsf{D}(\tilde{P}\Vert P)&= \mathsf{E}\Big (\log \dfrac{\,\mathrm {d}\pi _0}{\,\mathrm {d}\nu _0}(\tilde{X}_0) + \int _0^T {\frac{1}{2}}|u_t(\tilde{X}_t)|^2 \,\mathrm {d}t + u_t(\tilde{X}_t) \,\mathrm {d}\tilde{B}_t \Big )\\&=\mathsf{D}(\pi _0\Vert \nu _0) + \int _0^T {\frac{1}{2}}\langle \pi _t,|u_t|^2\rangle \,\mathrm {d}t. \end{aligned}$$

1.2 A.2 Derivation of Lagrangian: Finite State-Space Case

The derivation of the Lagrangian is entirely analogous to the Euclidean case except the R-N derivative is given according to  [17, Prop. 2.1.1]:

Upon taking log and expectation of both sides, we arrive at the relative entropy formula:

1.3 A.3 Proof of Proposition 1

The standard approach is to incorporate the constraint into the objective function by introducing the Lagrange multiplier \(\lambda = \{\lambda _t:0\le t\le T\}\) as follows:

$$\begin{aligned} \tilde{J}&(u,\lambda \,;\pi _0,z)\\&= \mathsf{D}(\pi _0 \Vert \nu _0) + \int _0^T {\frac{1}{2}}\langle \pi _t, |u_t|^2 + h^2\rangle + z_t\langle \pi _t, \tilde{\mathcal{A}}(u_t)h\rangle \,\mathrm {d}t\\&\quad +\int _0^T \langle \lambda _t, \frac{\partial \pi _t}{\partial t} - \tilde{\mathcal{A}}^\dagger (u_t) \pi _t \rangle \,\mathrm {d}t - z_T\langle \pi _T,h\rangle . \end{aligned}$$

Upon using integration by parts and the definition of the adjoint operator, after some manipulation involving completion of squares, we arrive at

$$\begin{aligned} \tilde{\mathsf{J}}(u,&\lambda \,;\pi _0,z)=\mathsf{D}(\pi _0 \Vert \nu _0) + \int _0^T {\frac{1}{2}}\langle \pi _t, |u_t - \sigma ^\top \nabla (\lambda _t - z_th)|^2\rangle \,\mathrm {d}t\\&-\int _0^T \langle \pi _t, \frac{\partial }{\partial t}\lambda _t + \mathcal{A}(\lambda _t - z_th)-{\frac{1}{2}}h^2+{\frac{1}{2}}|\sigma ^\top \nabla (\lambda _t - z_th)|^2\rangle \,\mathrm {d}t\\&+ \langle \pi _T,\lambda _T-z_Th\rangle - \langle \pi _0,\lambda _0\rangle . \end{aligned}$$

Therefore, it is natural to pick \(\lambda \) to satisfy the following partial differential equation:

$$\begin{aligned} -\frac{\partial \lambda _t}{\partial t}(x)&= \big (\mathcal{A} (\lambda _t(\cdot ) - z_th(\cdot ))\big ) - {\frac{1}{2}}h^2(x)+{\frac{1}{2}}\big |\sigma ^\top \nabla (\lambda _t - z_th)(x)\big |^2 \\&= e^{-(\lambda _t(x) - z_th(x))}(\mathcal{A}e^{\lambda _t(\cdot ) - z_th(\cdot )})(x) - {\frac{1}{2}}h^2(x) \nonumber \end{aligned}$$
(14)

with the boundary condition \(\lambda _T(x) = z_Th(x)\). With this choice, the objective function becomes

$$\begin{aligned} \tilde{\mathsf{J}}(u\,;\lambda ,\pi _0,z)&= \mathsf{D}(\pi _0 \Vert \nu _0) - \langle \pi _0, \lambda _0\rangle \\&+ \int _0^T {\frac{1}{2}}\pi _t\big ( \big |u_t - \sigma ^\top \nabla (\lambda _t-z_th)\big |^2\big ) \,\mathrm {d}t \end{aligned}$$

which suggest the optimal choice of control is:

$$\begin{aligned} u_t(x) = \sigma ^\top (x) \nabla (\lambda _t-z_th)(x). \end{aligned}$$

With this choice, the objective function becomes

$$\begin{aligned} \mathsf{D}(\pi _0\Vert \nu _0) - \langle \pi _0, \lambda _0\rangle&= \int _{\mathbb {S}}\pi _0(x) \log \frac{\pi _0(x)}{\nu _0(x)}- \lambda _0(x)\pi _0(x)\,\mathrm {d}x\\&=\int _{\mathbb {S}} \pi _0(x)\log \frac{\pi _0(x)}{\nu _0\exp (\lambda _0(x))}\,\mathrm {d}x \end{aligned}$$

which is minimized by choosing

$$ \pi _0(x) = \frac{1}{C} \nu _0(x)\exp (\lambda _0(x)) $$

where C is the normalization constant.

1.4 A.4 Proof of Proposition 2

The proof for the finite state-space case is entirely analogous to the proof for the Euclidean case. The Lagrange multiplier \(\lambda = \{\lambda _t\in \mathbb {R}^d: 0\le t\le T\}\) is introduced to transform the optimization problem into an unconstrained problem:

$$\begin{aligned} \tilde{\mathsf{J}}(u,\lambda \,;\pi _0,z)&= \mathsf{D}(\pi _0\Vert \nu _0)+ \int _0^T\pi _t^\top \big (C(u_t)+{\frac{1}{2}}h^2 + z_t \tilde{A}(u_t)h \big )\,\mathrm {d}t \\&+ \int _0^T\lambda _t^\top \big (\frac{\,\mathrm {d}\pi _t}{\,\mathrm {d}t} - \tilde{A}^\top (u_t) \pi _t\big )\,\mathrm {d}t - z_T h^\top \pi _T. \end{aligned}$$

Upon using integral by parts,

$$\begin{aligned} \tilde{\mathsf{J}}(u,\lambda \,;\pi _0,z)&= \mathsf{D}(\pi _0\Vert \nu _0) + \int _0^T \pi _t^\top \big (C(u_t) -\tilde{A}(u_t)(\lambda _t - z_th)\big ) \,\mathrm {d}t\\&+\int _0^T\pi _t^\top (-\dot{\lambda }_t+ {\frac{1}{2}}h^2)\,\mathrm {d}t +\pi _T^\top (\lambda _T-z_Th) - \pi _0^\top \lambda _0. \end{aligned}$$

The first integrand is

The minimizer is obtained, element by element, as

$$ [u_t]_{ij} = e^{([\lambda _t - z_th]_j-[\lambda _t - z_th]_i)} $$

and the corresponding minimum value is obtained by:

$$ [C(u_t^*) -\tilde{A}_t(\lambda _t - Z_th)]_i = -[Ae^{\lambda _t-z_th}]_i [e^{-(\lambda _t-z_th)}]_i. $$

Therefore with the minimum choice of \(u_t\) above,

$$\begin{aligned} \tilde{\mathsf{J}}(u,\lambda \,;\pi _0,z)&= \mathsf{D}(\pi _0\Vert \nu _0) + \int _0^T \pi _t^\top \big (-(Ae^{\lambda _t-z_th}) \cdot e^{-(\lambda _t-z_th)}\big ) \,\mathrm {d}t\\&+\int _0^T\pi _t^\top (-\dot{\lambda }_t+ {\frac{1}{2}}h^2)\,\mathrm {d}t +\pi _T^\top (\lambda _T-z_Th) - \pi _0^\top \lambda _0. \end{aligned}$$

Upon choosing \(\lambda \) according to:

$$ -[\dot{\lambda }_t]_i = [Ae^{\lambda _t-z_th}]_i [e^{-(\lambda _t-z_th)}]_i - {\frac{1}{2}}h_i^2,\quad \lambda _T = z_Th. $$

The objective function simplifies to

$$ \mathsf{D}(\pi _0\Vert \nu _0) - \pi _0^\top \lambda _0 = \sum _{i=1}^d[\pi _0]_i \log \frac{[\pi _0]_i}{[\nu _0]_ie^{[\lambda _0]_i}} $$

where the minimum value is obtained by choosing

$$ [\pi _0]_i =\frac{1}{C} [\nu _0]_ie^{[\lambda _0]_i} $$

where C is the normalization constant.

1.5 A.5 Proof of Proposition 3

Euclidean Case. Equation (9b) is identical to the backward path-wise Eq. (4). So, we need to only derive the equation for \(\mu _t\). Using the regular form of the product formula,

$$\begin{aligned} \frac{\partial \mu _t}{\partial t}&= \frac{1}{\pi _t}\frac{\partial \pi _t}{\partial t} -\frac{\partial \lambda _t}{\partial t}\\&=\frac{1}{\pi _t}(\tilde{\mathcal{A}}^\dagger (u_t)\pi _t) + e^{-(\lambda _t - z_th)}(\mathcal{A}e^{\lambda _t(\cdot ) - z_th(\cdot )}) - {\frac{1}{2}}h^2. \end{aligned}$$

With optimal control \(u_t = \sigma ^\top \nabla (\lambda _t-z_th)\),

$$\begin{aligned} (\tilde{\mathcal{A}}^\dagger (u_t)\pi _t)&=(\mathcal{A}^\dagger \pi _t)-{\text {div}}\big (\sigma \sigma ^\top \nabla \pi _t\big )\\&\quad +\pi _t {\text {div}}\big (\sigma \sigma ^\top \nabla (\mu _t+z_th)\big )\\&\quad +(\nabla \pi _t)^\top (\sigma \sigma ^\top \nabla (\mu _t+z_th)) \end{aligned}$$

and

$$\begin{aligned}&e^{-(\lambda _t - z_th)}(\mathcal{A}e^{\lambda _t(\cdot ) - z_th(\cdot )})\\&=\frac{1}{\pi _t}(\mathcal{A}\pi _t) - {\frac{1}{2}}|\sigma ^\top \nabla \log \pi _t|^2 - (\mathcal{A}(\mu _t + z_th)) \\&\quad + {\frac{1}{2}}\big |\sigma ^\top \nabla \log (\pi _t) - \sigma ^\top \nabla (\mu _t +z_th)\big |^2. \end{aligned}$$

Therefore,

$$\begin{aligned} \frac{\partial \mu _t}{\partial t}&=\frac{1}{\pi _t}\big ((\mathcal{A}^\dagger \pi _t) + (\mathcal{A}\pi _t)-{\text {div}}(\sigma \sigma ^\top \nabla \pi _t)\big )\\&\quad -(\mathcal{A}(\mu _t + z_th)) + {\text {div}}\big (\sigma \sigma ^\top \nabla (\mu _t+z_th)\big )\\&\quad + {\frac{1}{2}}\big |\sigma ^\top \nabla (\mu _t+z_th)\big |^2 - {\frac{1}{2}}h^2\\&=e^{-(\mu _t(x)+z_th(x))}\big (\mathcal{A}^\dagger e^{(\mu _t(\cdot ) +z_th(\cdot ) )}\big )(x) - {\frac{1}{2}}h^2(x) \end{aligned}$$

with the boundary condition \(\mu _0 = \log \nu _0\).

Finite State-Space Case. Equation (11b) is identical to the backward path-wise Eq. (6). To derive the equation for \(\mu _t\), use the product formula

$$\begin{aligned} \Big [\frac{\,\mathrm {d}\mu _t}{\,\mathrm {d}t}\Big ]_i&= \frac{1}{[\pi _t]_i}\Big [\frac{\,\mathrm {d}\pi _t}{\,\mathrm {d}t}\Big ]_i - \Big [\frac{\,\mathrm {d}\lambda _t}{\,\mathrm {d}t}\Big ]_i\\&=\frac{1}{[\pi _t]_i}\big [\tilde{A}^\top (u_t)\pi _t\big ]_i + [e^{-(\lambda _t-z_th)}]_i[Ae^{\lambda _t+z_th}]_i - {\frac{1}{2}}[h^2]_i. \end{aligned}$$

The first term is:

$$\begin{aligned} \big [\tilde{A}^\top (u_t)\pi _t\big ]_i&= \sum _{j=1}^d \Big ([A]_{ji}[u_t]_{ji}[\pi _t]_j-[A]_{ij}[u_t]_{ij}[\pi _t]_i\Big ) \end{aligned}$$

and the second term is:

$$\begin{aligned}{}[e^{-(\lambda _t-z_th)}]_i&[Ae^{\lambda _t+z_th}]_i \\&= \frac{1}{[\pi _t]_i}[e^{\mu _t+z_th}]_i\sum _{j=1}^d [A]_{ij} [\pi _t]_j [e^{-(\mu _t +z_th)}]_j. \end{aligned}$$

The formula for the optimal control gives

$$\begin{aligned}{}[u_t]_{ij}&=\frac{[\pi _t]_j}{[\pi _t]_i}[e^{-(\mu _t + z_th)}]_j[e^{\mu _t + z_th}]_i. \end{aligned}$$

Combining these expressions,

$$\begin{aligned} \Big [\frac{\,\mathrm {d}\mu _t}{\,\mathrm {d}t}\Big ]_i&= \sum _{j=1}^d[A]_{ji}[e^{-(\mu _t + z_th)}]_i[e^{\mu _t + z_th}]_j - {\frac{1}{2}}[h^2]_i\\&=[e^{-(\mu _t + z_th)}]_i [A^\top e^{\mu _t + z_th}]_i - {\frac{1}{2}}[h^2]_i \end{aligned}$$

which is precisely the path-wise form of the Eq. (5). At time \(t=0\), \(\mu _0 = \log (C[\pi _0]_i) - [\lambda _0]_i = \log [\nu _0]_i \).

Smoothing Distribution. Since \((\lambda _t, \mu _t)\) is the solution to the path-wise form of the Zakai equations, the optimal trajectory

$$ \pi _t = \frac{1}{C}e^{\mu _t+\lambda _t} $$

represents the smoothing distribution.

1.6 A.6 Proof of Proposition 4

The dynamic programming equation for the optimal control problem is given by (see  [1, Ch. 11.2]):

$$\begin{aligned} \min _{u\in \mathbb {R}^p} \Big \{\frac{\partial V_t}{\partial t}(x) + (\tilde{\mathcal{A}}(u) V_t)(x) + l(x,u\,;z_t)\Big \} = 0. \end{aligned}$$
(15)

Therefore,

$$\begin{aligned} -\frac{\partial V_t}{\partial t}(x)&= (\mathcal{A}V_t)(x) + h^2(x) + z_t(\mathcal{A}h)(x) \\&+ \min _{u}\Big \{{\frac{1}{2}}|u|^2 + u^\top \big (\sigma ^\top \nabla V_t(x) + z_t\sigma ^\top \nabla h(x)\big )\Big \}. \end{aligned}$$

Upon using the completion-of-square trick, the minimum is attained by a feedback form:

$$ u^* = -\sigma ^\top \nabla (V_t +z_th)(x). $$

The resulting HJB equation is given by

$$\begin{aligned} -\frac{\partial V_t}{\partial t}(x)&= \big (\mathcal{A}(V_t + z_th)\big )(x) + h^2(x) -{\frac{1}{2}}|\sigma ^\top \nabla (V_t+z_th)|^2 \end{aligned}$$

with boundary condition \(V_T(x) = - z_Th(x)\). Compare the HJB equation with the Eq. (14) for \(\lambda \), and it follows

$$ V_t(x) = -\lambda _t(x). $$

Rights and permissions

Reprints and permissions

Copyright information

© 2020 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kim, J.W., Mehta, P.G. (2020). An Optimal Control Derivation of Nonlinear Smoothing Equations. In: Junge, O., Schütze, O., Froyland, G., Ober-Blöbaum, S., Padberg-Gehle, K. (eds) Advances in Dynamics, Optimization and Computation. SON 2020. Studies in Systems, Decision and Control, vol 304. Springer, Cham. https://doi.org/10.1007/978-3-030-51264-4_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-51264-4_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-51263-7

  • Online ISBN: 978-3-030-51264-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics