Skip to main content

Motion planning of a quadrotor robot game using a simulation-based projected policy iteration method

Abstract

Making rational decisions for sequential decision problems in complex environments has been challenging researchers in various fields for decades. Such problems consist of state transition dynamics, stochastic uncertainties, long-term utilities, and other factors that assemble high barriers including the curse of dimensionality. Recently, the state-of-the-art algorithms in reinforcement learning studies have been developed, providing a strong potential to efficiently break the barriers and make it possible to deal with complex and practical decision problems with decent performance. We propose a formulation of a velocity varying one-on-one quadrotor robot game problem in the three-dimensional space and an approximate dynamic programming approach using a projected policy iteration method for learning the utilities of game states and improving motion policies. In addition, a simulation-based iterative scheme is employed to overcome the curse of dimensionality. Simulation results demonstrate that the proposed decision strategy can generate effective and efficient motion policies that can contend with the opponent quadrotor and gather advantaged status during the game. Flight experiments, which are conducted in the Networked Autonomous Vehicles (NAV) Lab at the Concordia University, have further validated the performance of the proposed decision strategy in the real-time environment.

This is a preview of subscription content, access via your institution.

References

  1. Ballard BW, 1983. The *-minimax search procedure for trees containing chance nodes. Artif Intell, 21(3):327–350. https://doi.org/10.1016/S0004-3702(83)80015-0

    Article  MATH  Google Scholar 

  2. Bellman R, 1952. On the theory of dynamic programming. Proc Nat Acad Sci, 38(8):716–719. https://doi.org/10.1073/pnas.38.8.716

    MathSciNet  Article  MATH  Google Scholar 

  3. Bertsekas DP, 1971. Dynamic Programming and Optimal Control. Athena Scientific, Belmont, Massachusetts, USA.

    Google Scholar 

  4. Bertsekas DP, 2007. Dynamic Programming and Optimal Control (3rd Ed.) Athena Scientific, Belmont, Massachusetts, USA.

    MATH  Google Scholar 

  5. Bertsekas DP, 2011. Temporal difference methods for general projected equations. IEEE Trans Autom Contr, 56(9):2128–2139. https://doi.org/10.1109/TAC.2011.2115290

    MathSciNet  Article  MATH  Google Scholar 

  6. Bertsekas DP, 2012. Incremental gradient, subgradient, and proximal methods for convex optimization: a survey In: Suvrit Sra SN, Wright SJ (Eds.), Optimization for Machine Learning. MIT Press, Massachusetts, USA.

    Google Scholar 

  7. Bertsekas DP, 2015. Lambda-policy iteration: a review and a new implementation. https://doi.org/abs/1507.01029

    Google Scholar 

  8. Bertsekas DP, Tsitsiklis JN, 2000. Gradient convergence in gradient methods with errors. SIAM J Optim, 10(3): 627–642. https://doi.org/10.1137/S1052623497331063

    MathSciNet  Article  MATH  Google Scholar 

  9. Buşoniu L, Ernst D, de Schutter B, et al., 2010. Online least-squares policy iteration for reinforcement learning control. Proc American Control Conf, p. 486–491. https://doi.org/10.1109/ACC.2010.5530856

    Google Scholar 

  10. Efroni Y, Dalal G, Scherrer B, et al., 2018a. Beyond the one step greedy approach in reinforcement learning. https://doi.org/abs/1802.03654

    Google Scholar 

  11. Efroni Y, Dalal G, Scherrer B, et al., 2018b. Multiple-step greedy policies in online and approximate reinforcement learning. https://doi.org/abs/1805.07956

    Google Scholar 

  12. Fang J, Zhang LM, Fang W, et al., 2016. Approximate dynamic programming for CGF air combat maneuvering decision. 2nd IEEE Int Conf on Computer and Communications, p. 1386–1390. https://doi.org/10.1109/CompComm.2016.7924931

    Google Scholar 

  13. Ghamry KA, Dong YQ, Kamel MA, et al., 2016. Real-time autonomous take-off, tracking and landing of UAV on amovingUGV platform. 24th Mediterranean Conf on Control and Automation, p. 1236–1241. https://doi.org/10.1109/MED.2016.7535886

    Google Scholar 

  14. Hastie T, Tibshirani R, Friedman J, 2001. The Elements of Statistical Learning. Springer, New York, USA.

    Book  MATH  Google Scholar 

  15. Hauk T, Buro M, Schaeffer J, 2004. Rediscovering *-minimax search. Int Conf on Computers and Games, p. 35–50. https://doi.org/10.1007/11674399_3

    Google Scholar 

  16. Liu ZX, Zhang YM, Yu X, et al., 2016. Unmanned surface vehicles: an overview of developments and challenges. Ann Rev Contr, 41:71–93. https://doi.org/10.1016/j.arcontrol.2016.04.018

    Article  Google Scholar 

  17. Ma YF, Ma XL, Song X, 2014. A case study on air combat decision using approximated dynamic programming. Math Probl Eng, 2014:183401. https://doi.org/10.1155/2014/183401

    Google Scholar 

  18. McGrew JS, 2008. Real-Time Maneuvering Decisions for Autonomous Air Combat. MS Thesis, Massachusetts Institute of Technology, Massachusetts, USA.

    Google Scholar 

  19. McGrew JS, How JP, Williams B, et al., 2010. Air-combat strategy using approximate dynamic programming. J Guid Contr Dynam, 33(5):1641–1654. https://doi.org/10.2514/1.46815

    Article  Google Scholar 

  20. Powell WB, 2007. Approximate Dynamic Programming: Solving the Curses of Dimensionality. John Wiley & Sons, New Jersey, USA.

    Book  MATH  Google Scholar 

  21. Russell SJ, Norvig P, 2010. Artificial Intelligence: a Modern Approach (3rd Ed.). Prentice Hall, New Jersey, USA.

    MATH  Google Scholar 

  22. Sharifi F, Chamseddine A, Mahboubi H, et al., 2016. A distributed deployment strategy for a network of cooperative autonomous vehicles. IEEE Trans Contr Syst Technol, 23(2):737–745. https://doi.org/10.1109/TCST.2014.2341658

    Article  Google Scholar 

  23. Sutton RS, Barto AG, 1998. Reinforcement Learning: an Introduction. MIT Press, Massachusetts, USA.

    MATH  Google Scholar 

  24. Thiery C, Scherrer B, 2010. Least-squares λ policy iteration: bias-variance trade-off in control problems. Proc 27th Int Conf on Machine Learning, p. 1071–1078.

    Google Scholar 

  25. Wang B, Zhang YM, 2018. An adaptive fault-tolerant sliding mode control allocation scheme for multirotor helicopter subject to simultaneous actuator faults. IEEE Trans Ind Electron, 65(5):4227–4236. https://doi.org/10.1109/TIE.2017.2772153

    Article  Google Scholar 

  26. Wang B, Yu X, Mu LX, et al., 2019. Disturbance observer-based adaptive fault-tolerant control for a quadrotor helicopter subject to parametric uncertainties and external disturbances. Mech Syst Signal Process, 120:727–743. https://doi.org/10.1016/j.ymssp.2018.11.001

    Article  Google Scholar 

  27. Yu HZ, 2010. Convergence of least squares temporal difference methods under general conditions. 27th Int Conf on Machine Learning, p. 1207–1214.

    Google Scholar 

  28. Yu HZ, 2012. Least squares temporal difference methods: an analysis under general conditions. SIAM J Contr Optim, 50(6):3310–3343. https://doi.org/10.1137/100807879

    MathSciNet  Article  MATH  Google Scholar 

  29. Yuan C, Zhang YM, Liu ZX, 2015. A survey on technologies for automatic forest fire monitoring, detection, and fighting using unmanned aerial vehicles and remote sensing techniques. Can J Forest Res, 45(7):783–792. https://doi.org/10.1139/cjfr-2014-0347

    Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to You-min Zhang.

Additional information

Project supported by the National Natural Science Foundation of China (Nos. 61573282 and 61833013), the Scholarships from China Scholarship Council (No. 201606100139), and the Natural Sciences and Engineering Research Council of Canada

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zhang, Ld., Wang, B., Liu, Zx. et al. Motion planning of a quadrotor robot game using a simulation-based projected policy iteration method. Frontiers Inf Technol Electronic Eng 20, 525–537 (2019). https://doi.org/10.1631/FITEE.1800571

Download citation

Key words

  • Reinforcement learning
  • Approximate dynamic programming
  • Decision making
  • Motion planning
  • Unmanned aerial vehicle

CLC number

  • TP242