Skip to main content

A Convex Optimization Approach to Dynamic Programming in Continuous State and Action Spaces

Abstract

In this paper, a convex optimization-based method is proposed for numerically solving dynamic programs in continuous state and action spaces. The key idea is to approximate the output of the Bellman operator at a particular state by the optimal value of a convex program. The approximate Bellman operator has a computational advantage because it involves a convex optimization problem in the case of control-affine systems and convex costs. Using this feature, we propose a simple dynamic programming algorithm to evaluate the approximate value function at pre-specified grid points by solving convex optimization problems in each iteration. We show that the proposed method approximates the optimal value function with a uniform convergence property in the case of convex optimal value functions. We also propose an interpolation-free design method for a control policy, of which performance converges uniformly to the optimum as the grid resolution becomes finer. When a nonlinear control-affine system is considered, the convex optimization approach provides an approximate policy with a provable suboptimality bound. For general cases, the proposed convex formulation of dynamic programming operators can be modified as a nonconvex bilevel program, in which the inner problem is a linear program, without losing the uniform convergence properties.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Notes

  1. However, our method is suitable for problems with high-dimensional action spaces.

  2. More precisely, the set \({\mathcal {U}}({\varvec{x}})\) needs to be represented by convex inequalities, i.e., there exist functions \(a_k: {\mathcal {X}} \times {\mathbb {R}}^m \rightarrow {\mathbb {R}}\) and \(b_k: {\mathcal {X}} \rightarrow {\mathbb {R}}\) such that

    $$\begin{aligned} {\mathcal {U}}({\varvec{x}}) := \{ {\varvec{u}} \in {\mathbb {R}}^m : a_k ({\varvec{x}}, {\varvec{u}}) \le b_k({\varvec{x}}), k=1, \ldots , N_{ineq}\}, \end{aligned}$$

    where \({\varvec{u}} \mapsto a_k ({\varvec{x}}, {\varvec{u}})\) is a convex function for each fixed \({\varvec{x}} \in {\mathcal {X}}\) and each k.

  3. Note that the convexity of v is unused in the second part of the proof of Proposition 3.1. Thus, it is valid in the nonconvex case.

  4. The matrix B used in our experiments can be downloaded from the following link: http://coregroup.snu.ac.kr/DB/B1000.mat.

  5. The CPU time increases superlinearly with the number of grid points. This is because the size of the optimization problem (5) also increases with the grid size. Note that the problem size is invariant when using the bi-level method in Sect. 4.2. Thus, in that case the CPU time scales linearly as shown in Table 3.

  6. The observation of the second-order empirical convergence rate is consistent with our theoretical result since Theorem 3.1 only suggests that the suboptimality gap decreases with the first-order rate. Thus, the actual convergence rate can be higher than the convergence rate for the suboptimality gap.

  7. To compute the optimal value function, we used the method in Sect. 4.2 discretizing the action space with 1001 equally spacing grid points.

  8. The forward reachable set can be over-approximated in an analytical way, particularly when a loose approximation is allowed. For a high quality of approximation, one may use advanced computational techniques with semidefinite approximation [37] and ellipsoidal approximation [38], among others.

References

  1. Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, Hoboken (2014)

    MATH  Google Scholar 

  2. Kushner, H., Dupuis, P.G.: Numerical Methods for Stochastic Control Problems in Continuous Time. Springer, New York (2013)

    MATH  Google Scholar 

  3. Hernández-Lerma, O., Lasserre, J.B.: Discrete-Time Markov Control Processes: Basic Optimality Criteria. Springer, New York (2012)

    MATH  Google Scholar 

  4. Savorgnan, C., Lasserre, J.B., Diehl, M.: Discrete-time stochastic optimal control via occupation measures and moment relaxations. In: Proceedings of Joint 48th IEEE Conference on Decision and Control and 28th Chinese Control Conference (2009)

  5. Dufour, F., Prieto-Rumeau, T.: Finite linear programming approximations of constrained discounted Markov decision processes. SIAM J. Control Optim. 51(2), 1298–1324 (2013)

    MathSciNet  MATH  Article  Google Scholar 

  6. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn. MIT Press, Cambridge (2018)

    MATH  Google Scholar 

  7. Bertsekas, D.P.: Reinforcement Learning and Optimal Control. Athena Scientific, Belmont (2019)

    Google Scholar 

  8. Szepesvari, C.: Algorithms for Reinforcement Learning. Morgan and Claypool Publishers, San Rafael (2010)

    MATH  Book  Google Scholar 

  9. Bertsekas, D.P.: Convergence of discretization procedures in dynamic programming. IEEE Trans. Autom. Control 20(3), 415–419 (1975)

    MathSciNet  MATH  Article  Google Scholar 

  10. Langen, H.-J.: Convergence of dynamic programming models. Math. Oper. Res. 6(4), 493–512 (1981)

    MathSciNet  MATH  Article  Google Scholar 

  11. Whitt, W.: Approximations of dynamic programs, I. Math. Oper. Res. 3(3), 231–243 (1978)

    MathSciNet  MATH  Article  Google Scholar 

  12. Hinderer, K.: On approximate solutions of finite-stage dynamic programs. In: Puterman, M.L. (ed.) Dynamic Programming and Its Applications, pp. 289–317. Academic Press, New York (1978)

    Chapter  Google Scholar 

  13. Chow, C.-S., Tsitsiklis, J.N.: An optimal one-way multigrid algorithm for discrete-time stochastic control. IEEE Trans. Autom. Control 36(8), 898–914 (1991)

    MathSciNet  MATH  Article  Google Scholar 

  14. Dufour, F., Prieto-Rumeau, T.: Approximation of Markov decision processes with general state space. J. Math. Anal. Appl. 388, 1254–1267 (2012)

    MathSciNet  MATH  Article  Google Scholar 

  15. Dufour, F., Prieto-Rumeau, T.: Approximation of average cost Markov decision processes using empirical distributions and concentration inequalities. Stochast. Int. J. Probab. Stochast. Process. 87(2), 273–307 (2015)

    MathSciNet  MATH  Article  Google Scholar 

  16. Saldi, N., Yüksel, S., Linder, T.: On the asymptotic optimality of finite approximations to Markov decision processes with Borel spaces. Math. Oper. Res. 42(4), 945–978 (2017)

    MathSciNet  MATH  Article  Google Scholar 

  17. Hernández-Lerma, O.: Discretization procedures for adaptive Markov control processes. J. Math. Anal. Appl. 137, 485–514 (1989)

    MathSciNet  MATH  Article  Google Scholar 

  18. Johnson, S.A., Stedinger, J.R., Shoemaker, C.A., Li, Y., Tejada-Guibert, J.A.: Numerical solution of continuous-state dynamic programs using linear and spline interpolation. Oper. Res. 41(3), 484–500 (1993)

    MATH  Article  Google Scholar 

  19. Hernández-Lerma, O., Lasserre, J.B.: Further Topics on Discrete-Time Markov Control Processes. Springer, New York (2012)

    MATH  Google Scholar 

  20. Rust, J.: Using randomization to break the curse of dimensionality. Econometrica 65(3), 487–516 (1997)

    MathSciNet  MATH  Article  Google Scholar 

  21. Munos, R., Szepesvári, C.: Finite-time bounds for fitted value iteration. J. Mach. Learn. Res. 1, 815–857 (2008)

    MathSciNet  MATH  Google Scholar 

  22. Haskell, W.B., Jain, R., Sharma, H., Yu, P.: A universal empirical dynamic programming algorithm for continuous state MDPs. IEEE Trans. Autom. Control 65(1), 115–129 (2020)

    MathSciNet  MATH  Article  Google Scholar 

  23. Jang, S., Yang, I.: Stochastic subgradient methods for dynamic programming in continuous state and action spaces. In: Proceedings of the 58th IEEE Conference on Decision and Control, pp. 7287–7293 (2019)

  24. Nesterov, Y.: Lectures on Convex Optimization, 2nd edn. Springer, Cham (2018)

    MATH  Book  Google Scholar 

  25. Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)

    MATH  Book  Google Scholar 

  26. Bertsekas, D.P.: Convex Optimization Algorithms. Athena Scientific, Bellmont (2015)

    MATH  Google Scholar 

  27. Falcone, M., Ferretti, R.: Semi-Lagrangian Approximation Schemes for Linear and Hamilton-Jacobi Equations. SIAM, Philadelphia (2013)

    MATH  Book  Google Scholar 

  28. Alla, A., Falcone, M., Saluzzi, L.: An efficient DP algorithm on a tree-structure for finite horizon optimal control problems. SIAM J. Sci. Comput. 41(4), A2384–A2406 (2019)

    MathSciNet  MATH  Article  Google Scholar 

  29. Picarelli, A., Reisinger, C.: Probabilistic error analysis for some approximation schemes to optimal control problems. Syst. Control Lett. 137, 104619 (2020)

    MathSciNet  MATH  Article  Google Scholar 

  30. Yang, I.: A convex optimization approach to dynamic programming in continuous state and action spaces. arXiv preprint arXiv:1810.03847 (2018)

  31. Abate, A., Prandini, M., Lygeros, J., Sastry, S.: Probabilistic reachability and safety for controlled discrete time stochastic hybrid systems. Automatica 44, 2724–2734 (2008)

    MathSciNet  MATH  Article  Google Scholar 

  32. Yang, I.: A dynamic game approach to distributionally robust safety specifications for stochastic systems. Automatica 94, 94–101 (2018)

    MathSciNet  MATH  Article  Google Scholar 

  33. Dantzig, G.B.: Linear Programming and Extensions. Princeton University Press, Princeton (1998)

    MATH  Google Scholar 

  34. Bertsimas, D., Tsitsiklis, J.N.: Introduction to Linear Optimization. Athena Scientific, Belmont (1997)

    Google Scholar 

  35. Sethi, S.P., Thompson, G.L.: Optimal Control Theory: Applications to Management Science and Economics. Springer, New York (2000)

    MATH  Google Scholar 

  36. Dubins, L.E.: On curves of minimal length with a constraint on average curvature, and with prescribed initial and terminal positions and tangents. Am. J. Math. 79(3), 497–516 (1957)

    MathSciNet  MATH  Article  Google Scholar 

  37. Magron, V., Garoche, P.-L., Henrion, D., Thirioux, X.: Semidefinite approximations of reachable sets for discrete-time polynomial systems. SIAM J. Control Optim. 57(4), 2799–2820 (2019)

    MathSciNet  MATH  Article  Google Scholar 

  38. Kurzhanskiy, A.A., Varaiya, P.: Reach set computation and control synthesis for discrete-time dynamical systems with disturbances. Automatica 47, 1414–1426 (2011)

    MathSciNet  MATH  Article  Google Scholar 

Download references

Acknowledgements

This work was supported in part by the Creative-Pioneering Researchers Program through SNU, the National Research Foundation of Korea funded by the MSIT (2020R1C1C1009766), and Samsung Electronics.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Insoon Yang.

Additional information

Communicated by Lars Grüne.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: State Space Discretization Using a Rectilinear Grid

Appendix: State Space Discretization Using a Rectilinear Grid

In this appendix, we provide a concrete way to discretize the state space using a rectilinear grid. The construction below satisfies all the conditions in Sect. 2.3.

  1. 1.

    Choose a convex compact set \({\mathcal {Z}}_0 := [\underline{{\varvec{x}}}_{0, 1}, \overline{{\varvec{x}}}_{0, 1}] \times [\underline{{\varvec{x}}}_{0, 2}, \overline{{\varvec{x}}}_{0, 2}] \times \cdots \times [\underline{{\varvec{x}}}_{0, n}, \overline{{\varvec{x}}}_{0, n}]\), and discretize it using an n-dimensional rectilinear grid. Set \(t \leftarrow 0\).

  2. 2.

    Compute (or over-approximate) the forward reachable setFootnote 8

    $$\begin{aligned} R_{t} := \big \{ f({\varvec{x}}, {\varvec{u}}, {\varvec{\xi }}) : {\varvec{x}} \in {\mathcal {Z}}_{t}, {\varvec{u}} \in {\mathcal {U}}({\varvec{x}}), {\varvec{\xi }} \in \varXi \big \}. \end{aligned}$$
  3. 3.

    Choose a convex compact set \({\mathcal {Z}}_{t+1} := [\underline{{\varvec{x}}}_{t+1, 1}, \overline{{\varvec{x}}}_{t+1, 1}] \times [\underline{{\varvec{x}}}_{t+1, 2}, \overline{{\varvec{x}}}_{t+1, 2}] \times \cdots \times [\underline{{\varvec{x}}}_{t+1, n}, \overline{{\varvec{x}}}_{t+1, n}]\) such that \(R_t \subseteq {\mathcal {Z}}_{t+1}\).

  4. 4.

    Expand the rectilinear grid to fit \({\mathcal {Z}}_{t+1}\).

  5. 5.

    Stop if \(t+1 = K\); otherwise, set \(t \leftarrow t+1\) and go to Step 2.

We can then choose \({\mathcal {C}}_i\) as each grid cell. We label \({\mathcal {C}}_i\) so that \(\bigcup _{i=1}^{N_{{\mathcal {C}}, t}} {\mathcal {C}}_i = {\mathcal {Z}}_t\) for all t. A two-dimensional example is shown in Fig. 1. This construction approach was used in Sects. 5.1 and 5.3.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Yang, I. A Convex Optimization Approach to Dynamic Programming in Continuous State and Action Spaces. J Optim Theory Appl 187, 133–157 (2020). https://doi.org/10.1007/s10957-020-01747-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10957-020-01747-1

Keywords

  • Dynamic programming
  • Convex optimization
  • Optimal control
  • Stochastic control

Mathematics Subject Classification

  • 90C39
  • 49L20
  • 90C25