Abstract
The purpose of this paper is to review and highlight some connections between the problem of nonlinear smoothing and optimal control of the Liouville equation. The latter has been an active area of recent research interest owing to work in mean-field games and optimal transportation theory. The nonlinear smoothing problem is considered here for continuous-time Markov processes. The observation process is modeled as a nonlinear function of a hidden state with an additive Gaussian measurement noise. A variational formulation is described based upon the relative entropy formula introduced by Newton and Mitter. The resulting optimal control problem is formulated on the space of probability distributions. The Hamilton’s equation of the optimal control are related to the Zakai equation of nonlinear smoothing via the log transformation. The overall procedure is shown to generalize the classical Mortensen’s minimum energy estimator for the linear Gaussian problem.
To Michael Dellnitz on the occasion of his 60th birthday.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bensoussan, A.: Estimation and Control of Dynamical Systems, vol. 48. Springer, Heidelberg (2018)
Bensoussan, A., Frehse, J., Yam, P., et al.: Mean Field Games and Mean Field Type Control Theory, vol. 101. Springer, Heidelberg (2013)
Brockett, R.W.: Optimal control of the liouville equation. AMS IP Stud. Adv. Math. 39, 23 (2007)
Carmona, R., Delarue, F., et al.: Probabilistic Theory of Mean Field Games with Applications I-II. Springer, Heidelberg (2018)
Chen, Y., Georgiou, T.T., Pavon, M.: On the relation between optimal transport and Schrödinger bridges: a stochastic control viewpoint. J. Optim. Theory Appl. 169(2), 671–691 (2016)
Chetrite, R., Touchette, H.: Variational and optimal control representations of conditioned and driven processes. J. Stat. Mech.: Theory Exp. 2015(12), P12001 (2015)
Fleming, W., Mitter, S.: Optimal control and nonlinear filtering for nondegenerate diffusion processes. Stochastics 8, 63–77 (1982)
Kailath, T., Sayed, A.H., Hassibi, B.: Linear Estimation. Prentice Hall, Upper Saddle River (2000)
Kappen, H.J., Ruiz, H.C.: Adaptive importance sampling for control and inference. J. Stat. Phys. 162(5), 1244–1266 (2016)
Mitter, S.K., Newton, N.J.: A variational approach to nonlinear estimation. SIAM J. Control Optim. 42(5), 1813–1833 (2003)
Mortensen, R.E.: Maximum-likelihood recursive nonlinear filtering. J. Optim. Theory Appl. 2(6), 386–394 (1968)
Pardoux, E.: Non-linear filtering, prediction and smoothing. In: Stochastic Systems: the Mathematics of Filtering and Identification and Applications, pp. 529–557. Springer (1981)
Reich, S.: Data assimilation: the Schrödinger perspective. Acta Numerica 28, 635–711 (2019)
Rogers, L.C.G., Williams, D.: Diffusions, Markov Processes and Martingales: Volume 2, ItĂ´ Calculus, vol. 2. Cambridge University Press, Cambridge (2000)
Ruiz, H., Kappen, H.J.: Particle smoothing for hidden diffusion processes: adaptive path integral smoother. IEEE Trans. Signal Process. 65(12), 3191–3203 (2017)
Sutter, T., Ganguly, A., Koeppl, H.: A variational approach to path estimation and parameter inference of hidden diffusion processes. J. Mach. Learn. Res. 17, 6544–80 (2016)
Van Handel, R.: Filtering, stability, and robustness. Ph.D. thesis, California Institute of Technology (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Appendix
A Appendix
1.1 A.1 Derivation of Lagrangian: Euclidean Case
By Girsanov’s theorem, the Radon-Nikodym derivative is obtained (see [13, Eqn. 35]) as follows:
Thus, we obtain the relative entropy formula:
1.2 A.2 Derivation of Lagrangian: Finite State-Space Case
The derivation of the Lagrangian is entirely analogous to the Euclidean case except the R-N derivative is given according to [17, Prop. 2.1.1]:
Upon taking log and expectation of both sides, we arrive at the relative entropy formula:
1.3 A.3 Proof of Proposition 1
The standard approach is to incorporate the constraint into the objective function by introducing the Lagrange multiplier \(\lambda = \{\lambda _t:0\le t\le T\}\) as follows:
Upon using integration by parts and the definition of the adjoint operator, after some manipulation involving completion of squares, we arrive at
Therefore, it is natural to pick \(\lambda \) to satisfy the following partial differential equation:
with the boundary condition \(\lambda _T(x) = z_Th(x)\). With this choice, the objective function becomes
which suggest the optimal choice of control is:
With this choice, the objective function becomes
which is minimized by choosing
where C is the normalization constant.
1.4 A.4 Proof of Proposition 2
The proof for the finite state-space case is entirely analogous to the proof for the Euclidean case. The Lagrange multiplier \(\lambda = \{\lambda _t\in \mathbb {R}^d: 0\le t\le T\}\) is introduced to transform the optimization problem into an unconstrained problem:
Upon using integral by parts,
The first integrand is
The minimizer is obtained, element by element, as
and the corresponding minimum value is obtained by:
Therefore with the minimum choice of \(u_t\) above,
Upon choosing \(\lambda \) according to:
The objective function simplifies to
where the minimum value is obtained by choosing
where C is the normalization constant.
1.5 A.5 Proof of Proposition 3
Euclidean Case. Equation (9b) is identical to the backward path-wise Eq. (4). So, we need to only derive the equation for \(\mu _t\). Using the regular form of the product formula,
With optimal control \(u_t = \sigma ^\top \nabla (\lambda _t-z_th)\),
and
Therefore,
with the boundary condition \(\mu _0 = \log \nu _0\).
Finite State-Space Case. Equation (11b) is identical to the backward path-wise Eq. (6). To derive the equation for \(\mu _t\), use the product formula
The first term is:
and the second term is:
The formula for the optimal control gives
Combining these expressions,
which is precisely the path-wise form of the Eq. (5). At time \(t=0\), \(\mu _0 = \log (C[\pi _0]_i) - [\lambda _0]_i = \log [\nu _0]_i \).
Smoothing Distribution. Since \((\lambda _t, \mu _t)\) is the solution to the path-wise form of the Zakai equations, the optimal trajectory
represents the smoothing distribution.
1.6 A.6 Proof of Proposition 4
The dynamic programming equation for the optimal control problem is given by (see [1, Ch. 11.2]):
Therefore,
Upon using the completion-of-square trick, the minimum is attained by a feedback form:
The resulting HJB equation is given by
with boundary condition \(V_T(x) = - z_Th(x)\). Compare the HJB equation with the Eq. (14) for \(\lambda \), and it follows
Rights and permissions
Copyright information
© 2020 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Kim, J.W., Mehta, P.G. (2020). An Optimal Control Derivation of Nonlinear Smoothing Equations. In: Junge, O., Schütze, O., Froyland, G., Ober-Blöbaum, S., Padberg-Gehle, K. (eds) Advances in Dynamics, Optimization and Computation. SON 2020. Studies in Systems, Decision and Control, vol 304. Springer, Cham. https://doi.org/10.1007/978-3-030-51264-4_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-51264-4_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-51263-7
Online ISBN: 978-3-030-51264-4
eBook Packages: EngineeringEngineering (R0)