Learning control lyapunov functions from counterexamples and demonstrations
We present a technique for learning control Lyapunov-like functions, which are used in turn to synthesize controllers for nonlinear dynamical systems that can stabilize the system, or satisfy specifications such as remaining inside a safe set, or eventually reaching a target set while remaining inside a safe set. The learning framework uses a demonstrator that implements a black-box, untrusted strategy presumed to solve the problem of interest, a learner that poses finitely many queries to the demonstrator to infer a candidate function, and a verifier that checks whether the current candidate is a valid control Lyapunov-like function. The overall learning framework is iterative, eliminating a set of candidates on each iteration using the counterexamples discovered by the verifier and the demonstrations over these counterexamples. We prove its convergence using ellipsoidal approximation techniques from convex optimization. We also implement this scheme using nonlinear MPC controllers to serve as demonstrators for a set of state and trajectory stabilization problems for nonlinear dynamical systems. We show how the verifier can be constructed efficiently using convex relaxations of the verification problem for polynomial systems to semi-definite programming problem instances. Our approach is able to synthesize relatively simple polynomial control Lyapunov-like functions, and in that process replace the MPC using a guaranteed and computationally less expensive controller.
KeywordsLyapunov functions Controller synthesis Learning from demonstrations Concept learning
We are grateful to Mr. Sina Aghli, Mr. Souradeep Dutta, Prof. Christoffer Heckman and Prof. Eduardo Sontag for helpful discussions. This work was funded in part by NSF under Award Numbers SHF 1527075 and CPS 1646556. All opinions expressed are those of the authors and not necessarily of the NSF.
- Abbas, H., Fainekos, G., Sankaranarayanan, S., Ivancic, F., & Gupta, A. (2013). Probabilistic temporal logic falsification of cyber-physical systems. Transactions on Embedded Computing Systems (TECS), 12, 95.Google Scholar
- Ahmadi, A. A., & Majumdar, A. (2014). DSOS and SDSOS optimization: LP and SOCP-based alternatives to sum of squares optimization. In 2014 48th annual conference on information sciences and systems (CISS) (pp. 1–5). IEEE.Google Scholar
- Ames, A. D., & Powell, M. (2013). Towards the unification of locomotion and manipulation through control Lyapunov functions and quadratic programs. In D. C. Tarraf (Ed.), Control of cyber-physical systems: workshop held at Johns Hopkins University (pp. 219–240). Heidelberg: Springer.Google Scholar
- Annpureddy, Y., Liu, C., Fainekos, G. E., & Sankaranarayanan, S. (2011). S-TaLiRo: A tool for temporal logic falsification for hybrid systems. In P. A. Abdulla, & K. R. M. Leino (Eds.), Tools and algorithms for the construction and analysis of systems, LNCS (Vol. 6605, pp. 254–257). Berlin, Heidelberg: Springer.Google Scholar
- Atkeson, C. G., & Liu, C. (2013). Trajectory-based dynamic programming. In K. Mombaur, & K. Berns (Eds.), Modeling, simulation and optimization of bipedal walking (pp. 1–15). Berlin, Heidelberg: Springer.Google Scholar
- Berkenkamp, F., Turchetta, M., Schoellig, A., & Krause, A. (2017). Safe model-based reinforcement learning with stability guarantees. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, & R. Garnett (Eds.), Advances in neural information processing systems (Vol. 30, pp. 908–918). Red Hook: Curran Associates Inc.Google Scholar
- Bertsekas, D. P. (2008). Approximate dynamic programming.Google Scholar
- Brown, C. W., & Davenport, J. H. (2007). The complexity of quantifier elimination and cylindrical algebraic decomposition. In Proceedings of the 2007 international symposium on symbolic and algebraic computation, ISSAC ’07 (pp. 54–60). New York, NY: ACM. https://doi.org/10.1145/1277548.1277557.
- Bryson, A. E. (1975). Applied optimal control: Optimization, estimation and control. Boca Raton: CRC Press.Google Scholar
- Chernova, S., & Veloso, M. (2008). Learning equivalent action choices from demonstration. In IEEE/RSJ international conference on intelligent robots and systems, 2008. IROS 2008 (pp. 1216–1221). IEEE.Google Scholar
- Curtis, J. W. (2003). CLF-based nonlinear control with polytopic input constraints. In 42nd IEEE international conference on decision and control (IEEE Cat. No. 03CH37475) (Vol. 3, pp. 2228–2233). https://doi.org/10.1109/CDC.2003.1272949.
- Donzé, A., & Maler, O. (2010). Robust satisfaction of temporal logic over real-valued signals. In K. Chatterjee, & T. A. Henzinger (Eds.), Formal Modeling and Analysis of Timed Systems (Vol. 6246, pp. 92–106). Berlin, Heidelberg: Springer.Google Scholar
- Donzé, A., Krogh, B., & Rajhans, A. (2009). Parameter synthesis for hybrid systems with an application to simulink models. In International workshop on hybrid systems: Computation and control (pp. 165–179). Springer.Google Scholar
- El Ghaoui, L., & Balakrishnan, V. (1994). Synthesis of fixed-structure controllers via numerical optimization. In Proceedings of the 33rd IEEE conference on decision and control, 1994 (Vol. 3, pp. 2678–2683). IEEE.Google Scholar
- Francis, B. A., & Maggiore, M. (2016). Models of mobile robots in the plane. In Flocking and rendezvous in distributed robotics (pp. 7–23). Springer. https://doi.org/10.1007/978-3-319-24729-8_2.
- Gao, S., Kong, S., & Clarke, E.M.: dReal: An SMT solver for nonlinear theories over the reals. In International conference on automated deduction (pp. 208–214). Springer. https://doi.org/10.1007/978-3-642-38574-2_14.
- Helton, J. W., & Merino, O. (1997). Coordinate optimization for bi-convex matrix inequalities. In Proceedings of IEEE CDC (Vol. 4, pp. 3609–3613).Google Scholar
- Henrion, D., Lofberg, J., & Kocvara, M., Stingl, M. (2005). Solving polynomial static output feedback problems with PENBMI. In Proceedings of the 44th IEEE conference on decision and control (pp. 7581–7586). IEEE.Google Scholar
- Huang, Z., Wang, Y., Mitra, S., Dullerud, G. E., & Chaudhuri, S. (2015). Controller synthesis with inductive proofs for piecewise linear systems: An SMT-based algorithm. In 2015 54th IEEE conference on decision and control (CDC) (pp. 7434–7439). IEEE.Google Scholar
- Jha, S., Gulwani, S., Seshia, S. A., & Tiwari, A. (2010). Oracle-guided component-based program synthesis. In Proceedings of the 32nd ACM/IEEE international conference on software engineering—Volume 1, ICSE ’10 (pp. 215–224). New York, NY: ACM. https://doi.org/10.1145/1806799.1806833.
- Kapinski, J., Deshmukh, J. V., Sankaranarayanan, S., & Arechiga, N. (2014). Simulation-guided Lyapunov analysis for hybrid dynamical systems. In Proceedings of the 17th international conference on hybrid systems: Computation and control (pp. 133–142). ACM.Google Scholar
- Kocsis, L., & Szepesvári, C. (2006). Bandit based Monte-Carlo planning. In Proceedings of 17th European conference on machine learning, ECML 2006, Berlin, Germany, September 18–22, 2006 (pp. 282–293). https://doi.org/10.1007/11871842_29.
- Lavalle, S. M., & Kuffner, Jr., J. J. (2000). Rapidly-exploring random trees: Progress and prospects. In Proceedings workshop on the algorithmic foundations of robotics. Citeseer.Google Scholar
- Leth, T., Wisniewski, R., & Sloth, C.: On the existence of polynomial Lyapunov functions for rationally stable vector fields. In 2017 IEEE 56th annual conference on decision and control (CDC) (pp. 4884–4889). https://doi.org/10.1109/CDC.2017.8264381.
- Majumdar, A., & Tedrake, R. (2013). Robust online motion planning with regions of finite time invariance. In Algorithmic foundations of robotics X (pp. 543–558). Springer. https://doi.org/10.1007/978-3-642-36279-8_33.
- Majumdar, A., Ahmadi, A. A., & Tedrake, R. (2013). Control design along trajectories with sums of squares programming. In 2013 IEEE international conference on robotics and automation (ICRA) (pp. 4054–4061). IEEE.Google Scholar
- Mason, M. (1985). The mechanics of manipulation. In Proceedings of 1985 IEEE international conference on robotics and automation (Vol. 2, pp. 544–548). IEEE.Google Scholar
- Mordatch, I., & Todorov, E. (2014). Combining the benefits of function approximation and trajectory optimization. In Proceedings of robotics: Science and systems, Berkeley, USA. https://doi.org/10.15607/RSS.2014.X.052.
- MOSEK Aps. (2017). The MOSEK optimization toolbox for MATLAB manual. Version 7. (Vol. 54, pp. 2–1). https://docs.mosek.com/7.1/toolbox/index.html. Accessed 11 July 2018.
- Mouelhi, S., Girard, A., & Gössler, G. (2013). CoSyMa: A tool for controller synthesis using multi-scale abstractions. In Proceedings of the 16th international conference on hybrid systems: Computation and control (pp. 83–88). ACM.Google Scholar
- Nguyen, Q., & Sreenath, K. (2015). Optimal robust control for bipedal robots through control Lyapunov function based quadratic programs. In Proceedings of robotics: science and systems. Rome, Italy. https://doi.org/10.15607/RSS.2015.XI.048
- Papachristodoulou, A., & Prajna, S. (2002). On the construction of Lyapunov functions using the sum of squares decomposition. In IEEE CDC (pp. 3482–3487). IEEE Press.Google Scholar
- Peet, M. M., & Bliman, P. A. (2008). Polynomial Lyapunov functions for exponential stability of nonlinear systems on bounded regions. IFAC proceedings volumes. 17th IFAC World Congress (Vol. 41, No. 2, pp. 1111–1116). https://doi.org/10.3182/20080706-5-KR-1001.00192.
- Prajna, S., & Jadbabaie, A. (2004). Safety verification of hybrid systems using barrier certificates. In HSCC (Vol. 2993, pp. 477–492). Springer.Google Scholar
- Prajna, S., Papachristodoulou, A., Parrilo, P. A. (2002). Introducing SOSTOOLS: A general purpose sum of squares programming solver. In Proceedings of the 41st IEEE conference on decision and control, 2002 (Vol. 1, pp. 741–746). IEEE.Google Scholar
- Prieur, C., & Praly, L. (1999). Uniting local and global controllers. In Proceedings of the 38th IEEE conference on decision and control, 1999 (Vol. 2, pp. 1214–1219). IEEE.Google Scholar
- Raman, V., Donzé, A., Sadigh, D., Murray, R. M., & Seshia, S. A. (2015). Reactive synthesis from signal temporal logic specifications. In Proceedings of the 18th international conference on hybrid systems: Computation and control (pp. 239–248). ACM.Google Scholar
- Ravanbakhsh, H., & Sankaranarayanan, S. (2015a). Counter-example guided synthesis of control Lyapunov functions for switched systems. In 2015 54th IEEE conference on decision and control (CDC) (pp. 4232–4239). https://doi.org/10.1109/CDC.2015.7402879.
- Ravanbakhsh, H., & Sankaranarayanan, S. (2015b). Counterexample guided synthesis of switched controllers for reach-while-stay properties. arXiv preprint arXiv:1505.01180.
- Ravanbakhsh, H., & Sankaranarayanan, S. (2017). Learning Lyapunov (potential) functions from counterexamples and demonstrations. In Proceedings of robotics: Science and systems. Cambridge, MA. https://doi.org/10.15607/RSS.2017.XIII.049.
- Ravanbakhsh, H., Aghli, S., Heckman, C., & Sankaranarayanan, S. (2018). Path-following through control funnel functions. CoRR arXiv:1804.05288.
- Ravanbakhsh, H., Sankaranarayanan, S. (2016). Robust controller synthesis of switched systems using counterexample guided framework. In 2016 international conference on embedded software (EMSOFT) (pp. 1–10). https://doi.org/10.1145/2968478.2968485.
- Ross, S., Gordon, G. J., & Bagnell, D. (2011). A reduction of imitation learning and structured prediction to no-regret online learning. In AISTATS (Vol. 1, p. 6).Google Scholar
- Rungger, M., & Zamani, M. (2016). SCOTS: A tool for the synthesis of symbolic controllers. In Proceedings of the 19th international conference on hybrid systems: Computation and control (pp. 99–104). ACM.Google Scholar
- Shor, N. (1987). Originally in Russian. Kibernetika, 23(6), 731–734. Originally in Russian. Kibernetika, 6(1987), 9–11.Google Scholar
- Solar-Lezama, A. (2008). Program synthesis by sketching. Phd Thesis. University of California, Berkeley.Google Scholar
- Sontag, E. D. (1982). A characterization of asymptotic controllability. In Dynamical systems II (Proceedings of University of Florida international symposium) (pp. 645–648). New York, NY: Academic Press.Google Scholar
- Stolle, M., & Atkeson, C. G. (2006). Policies based on trajectory libraries. In Proceedings 2006 IEEE international conference on robotics and automation, 2006. ICRA 2006 (pp. 3344–3349). IEEE.Google Scholar
- Suarez, R., Solis-Daun, J., & Aguirre, B. (2001). Global CLF stabilization for systems with compact convex control value sets. In Proceedings of the 40th IEEE conference on decision and control (Cat. No. 01CH37228) (Vol. 4, pp. 3838–3843). https://doi.org/10.1109/.2001.980463.
- Taly, A., & Tiwari, A. (2010). Switching logic synthesis for reachability. In Proceedings of the tenth ACM international conference on embedded software (pp. 19–28). ACM.Google Scholar
- Tan, W., & Packard, A. (2004). Searching for control Lyapunov functions using sums of squares programming. In Allerton conference on communication, control and computing (pp. 210–219).Google Scholar
- Tedrake, R., Manchester, I. R., Tobenkin, M., & Roberts, J. W. (2010). LQR-trees: Feedback motion planning via sums-of-squares verification. The International Journal of Robotics Research, 18, 534–555.Google Scholar
- Topcu, U., Packard, A., Seiler, P., & Wheeler, T. (2007). Stability region analysis using simulations and sum-of-squares programming. In Proceedings of the American control conference (pp. 6009–6014).Google Scholar
- Vanderbei, R.J. (2001). Linear programming: Foundations & extensions (2nd ed.). Berlin: Springer. http://www.princeton.edu/~rvdb/LPbook/.
- Wang, L., Theodorou, E. A., & Egerstedt, M. (2017). Safe learning of quadrotor dynamics using barrier certificates. CoRR arXiv:1710.05472.
- Wieland, P., & Allgower, F. (2007). Constructive safety using control barrier functions. IFAC proceedings volumes. 7th IFAC symposium on nonlinear control systems (Vol. 40, No. 12, pp. 462–467). https://doi.org/10.3182/20070822-3-ZA-2920.00076.
- Wongpiromsarn, T., Topcu, U., Ozay, N., Xu, H., & Murray, R. M. (2011). Tulip: a software toolbox for receding horizon temporal logic planning. In Proceedings of the 14th international conference on hybrid systems: computation and control (pp. 313–314). ACM.Google Scholar
- Yordanov, B., & Belta, C. (2008). Parameter synthesis for piecewise affine systems from temporal logic specifications. In International workshop on hybrid systems: Computation and control (pp. 542–555). Springer.Google Scholar
- Zhang, T., Kahn, G., Levine, S., & Abbeel, P. (2016). Learning deep control policies for autonomous aerial vehicles with MPC-guided policy search. In 2016 IEEE international conference on robotics and automation (ICRA) (pp. 528–535). IEEE.Google Scholar
- Zhong, M., Johnson, M., Tassa, Y., Erez, T., & Todorov, E. (2013). Value function approximation and model predictive control. In 2013 IEEE symposium on adaptive dynamic programming and reinforcement learning (ADPRL) (pp. 100–107). IEEE.Google Scholar