Abstract
In this paper, we present an inter-group confrontation and intra-group cooperation method for a predator group and prey group, and construct a multi-group multi-agent system. We model the motion of the prey group using the flocking control algorithm. The prey group can cooperatively avoid predators and maintain the integrity of the group after the predators have been detected. The autonomous decision-making of the predator group is implemented based on the distributed reinforcement learning algorithm. To efficiently share the learning experience among agents in the predator group, a distributed cooperative reinforcement learning algorithm with variable weights is proposed to accelerate the convergence of the learning algorithm. Simulations show the feasibility of this proposed method.
Similar content being viewed by others
References
M. Tang, Z. Chen, and F. Yin, “Robot tracking in SLAM with Masreliez-Martin unscented Kalman filter,” International Journal of Control, Automation, and Systems, vol. 18, no. 9, pp. 2315–2325, 2020.
M. G. Ball, B. Qela, and S. Wesolkowski, “A review of the use of computational intelligence in the design of military surveillance networks,” Recent Advances in Computational Intelligence in Defense and Security, pp. 663–693, 2016.
L. V. Truong, S. Huang, V. T. Yen, and P. V. Cuong, “Adaptive trajectory neural network tracking control for industrial robot manipulators with deadzone robust compensator,” International Journal of Control, Automation, and Systems, vol. 18, no. 9, pp. 2423–2434, 2020.
G. Tian, Y. Ren, and M. C. Zhou, “Dual-objective scheduling of rescue vehicles to distinguish forest fires via differential evolution and particle swarm optimization combined algorithm,” IEEE Transactions on Intelligent Transportation Systems, vol. 17, no. 11, pp. 3009–3021, 2016.
L. F. Gonzalez, G. A. Montes, E. Puig, S. Johnson, K. Mengersen, and K. J. Gaston, “Unmanned aerial vehicles (UAVs) and artificial intelligence revolutionizing wildlife monitoring and conservation,” Sensors, vol. 16, no. 1, pp. 86–97, 2016.
K. Derr and M. Manic, “Wireless sensor network configuration-Part I: Mesh simplification for centralized algorithms,” IEEE Transactions on Industrial Informatics, vol. 9, no. 3, pp. 1717–1727, 2013.
K. Derr and M. Manic, “Wireless sensor network configuration-Part II: Adaptive coverage for decentralized algorithms,” IEEE Transactions on Industrial Informatics, vol. 9, no. 3, pp. 1728–1738, 2013.
A. H. Sayed, “Adaptive networks,” Proceedings of IEEE, vol. 102, no. 4, pp. 460–497, 2014.
R. Olfati-Saber and P. Jalalkamali, “Coupled distributed estimation and control for mobile sensor networks,” IEEE Transactions on Automatic Control, vol. 57, no. 10, pp. 2609–2614, 2012.
C. W. Reynolds, “Flocks, herds, and schools: A distributed behavioral model,” Proc. of the 14th Annual Conference on Computer Graphics and Interactive Techniques, pp. 25–34, 1984.
R. Olfati-Saber, “Flocking for multi-agent dynamic systems: Algorithms and theory,” IEEE Transactions on Automatic Control, vol. 51, no. 3, pp. 401–420, March 2006.
S. Jung, “A neural network technique of compensating for an inertia model error in a time-delayed controller for robot manipulators,” International Journal of Control, Automation, and Systems, vol. 18, no. 7, pp. 1863–1871, 2020.
Y. Q. Miao, A. Khamis, and M. S. Kamel, “Applying anti-flocking model in mobile surveillance systems,” Proc. of IEEE International Conference on Automatic Control and Intelligent Systems (AIS’10), pp. 1–6, 2010.
N. Ganganath, C. T. Cheng, and C. Tse, “Distributed anti-flocking algorithms for dynamic coverage of mobile sensor networks,” IEEE Transactions on Industrial Informatics, vol. 12, no. 5, pp. 1795–1805, 2016.
S. H. Semnani and O. A. Basir, “Semi-flocking algorithm for motion control of mobile sensors in large-scale surveillance systems,” IEEE Transactions on Cybernetics, vol. 45, no. 1, pp. 129–137, 2015.
W. Yuan, N. Ganganath, C. T. Cheng, G. Qing, and F. C. M. Lau, “Semi-flocking-controlled mobile sensor networks for dynamic area coverage and multiple target tracking,” IEEE Sensors Journal, vol. 18, no. 21, pp. 8883–8892, 2018.
M. Wang, H. Su, M. Zhao, M. Z. Q. Chen, and H. Wang, “Flocking of multiple autonomous agents with preserved network connectivity and heterogeneous nonlinear dynamics,” Neurocomputing, vol. 115, pp. 169–177, 2013.
N. Zhao and J. Zhu, “Sliding mode control for robust consensus of general linear uncertain multi-agent systems,” International Journal of Control, Automation, and Systems, vol. 18, no. 8, pp. 2170–2175, 2020.
G. Wang, R. Xue, C. Zhou, and J. Gong, “Complex-valued adaptive networks based on entropy estimation,” Signal Processing, vol. 149, pp. 124–134, 2018.
S. Battilotti, F. Cacace, M. d’Angelo, and A. Germani, “Distributed Kalman filtering over sensor networks with unknown random link failures,” IEEE Control Systems Letters, vol. 2, no. 4, pp. 587–592, 2018.
R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, MIT Press, Cambridge, MA, USA, 1998.
M. Qiao, H. Zhao, L. Zhou, C. Zhu, and S. Huang, “Topology-transparent scheduling based on reinforcement learning in self-organized wireless networks,” IEEE Access, vol. 6, pp. 20221–20230, 2018.
A. Konar, I. G. Chakraborty, S. J. Singh, L. C. Jain, and A. K. Nagar, “A deterministic improved Q-learning for path planning of a mobile robot,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 43, no. 5, pp. 1141–1153, 2013.
Q. Wang, H. Liu, K. Gao, and L. Zhang, “Improved multi-agent reinforcement learning for path planning-based crowd simulation,” IEEE Access, vol. 7, pp. 73841–73855, 2019.
E. S. Low, P. Ong, and K. C. Cheah, “Solving the optimal path planning of a mobile robot using improved Q-learning,” Robotics and Autonomous Systems, vol. 115, pp. 143–161, 2019.
F. Li, Q. Jiang, S. Zhang, M. Wei, and R. Song, “Robot skill acquisition in assembly process using deep reinforcement learning,” Neurocomputing, vol. 345, pp. 92–102, 2019.
S. Hung and S. N. Givigi, “A Q-learning approach to flocking with UAVs in a stochastic environment,” IEEE Transactions on Cybernetics, vol. 47, no. 1, pp. 186–197, January 2017.
C. Wang, J. Wang, X. Zhang, and X. Zhang, “Autonomous navigation of UAV in large-scale unknown complex environment with deep reinforcement learning,” Proc. of IEEE Global Conference on Signal and Information Processing (Global SIP), Montreal, QC, pp. 858–862, 2017.
H. M. La, R. Lim, and W. Sheng, “Multirobot cooperative learning for predator avoidance,” IEEE Transactions on Control Systems Technology, vol. 23, no. 1, pp. 52–63, 2015.
J. Krause, G. Ruxton, and D. Rubenstein, “Is there always an influence of shoal size on predator hunting success?” Journal of Fish Biology, vol. 52, no. 3, pp. 494–501, 1998.
C. J. C. H. Watkins and P. Dayan, “Technical note: Q-learning,” Machine Learning, vol. 8, no. 3, pp. 279–292, 1992.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work was supported by the National Key Research and Development Program of China (Project No. 2017YFB0503400), National Natural Science Foundation of China under Grants 61371182 and 41301459.
Gang Wang received his B.E. degree in communication engineering and a Ph.D. degree in biomedical engineering from University of Electronic Science and Technology of China, Chengdu, China, in 1999 and 2008, respectively. He is an Associate Professor in the School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu. His current research interests include distributed signal processing and intelligent systems.
Jian Xiao was born in Hengyang, Hunan Province, China. He received his B.Eng. degree in electronic information science and technology from Physics and Electrical Engineering College of Jishou University, Hunan, China, in 2016. He received his M.Eng. degree in the School of Information and Communication Engineering, University of Electronic Science and Technology of China (UESTC), Chengdu, China, in 2020. He is currently pursuing a Ph.D. degree in the School of Information and Communication Engineering of UESTC.
Rui Xue received his B.S. and Ph.D. degrees from Beihang University in 2001 and 2010, respectively. He is a professor in the School of Electronics and Information Engineering, Beihang University. His research interests are satellite navigation error modeling, estimation, and validation. His current research interests include statistical signal processing and intelligent unmanned systems.
Yongting Yuan was born in Fugou, Henan Province, in 1972. He received his M.S. degree from Northeastern University in 2005. He is now a senior engineer of No. 31435 research institute. His research interests include air traffic control technology.
Rights and permissions
About this article
Cite this article
Wang, G., Xiao, J., Xue, R. et al. A Multi-group Multi-agent System Based on Reinforcement Learning and Flocking. Int. J. Control Autom. Syst. 20, 2364–2378 (2022). https://doi.org/10.1007/s12555-021-0170-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12555-021-0170-5