Abstract
The coordination of unmanned air–ground vehicles has been an active area due to the significant advantages of this coordination wherein unmanned air vehicles (UAVs) have a wide field of view, enabling them to effectively guide a swarm of unmanned ground vehicles (UGVs). Due to significant recent advances in artificial intelligence (AI), autonomous agents are being used to design more robust coordination of air–ground systems, reducing the intervention load of human operators and increasing the autonomy of unmanned air–ground systems. A guidance and control shepherding system design allows for single learning agent to influence and manage a larger swarm of rule-based entities. In this chapter, we present a learning algorithm for a sky shepherd-guiding rule-based AI-driven UGVs. The apprenticeship bootstrapping learning algorithm is introduced and is applied to the aerial shepherding task.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abbeel, P., Ng, A.Y.: Apprenticeship learning via inverse reinforcement learning. In: Proceedings of the Twenty-First International Conference on Machine Learning, p. 1. ACM, New York (2004)
Abbeel, P., Coates, A., Ng, A.Y.: Autonomous helicopter aerobatics through apprenticeship learning. Int. J. Rob. Res. 29(13), 1608–1639 (2010)
Aghaeeyan, A., Abdollahi, F., Talebi, H.A.: UAV–UGVs cooperation: with a moving center based trajectory. Rob. Auton. Syst. 63, 1–9 (2015)
Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Rob. Auton. Syst. 57(5), 469–483 (2009)
Arora, S., Doshi, P.: A survey of inverse reinforcement learning: Challenges, methods and progress (2018). Preprint arXiv:1806.06877
Balch, T., Arkin, R.C.: Behavior-based formation control for multirobot teams. IEEE Trans. Robot. Autom. 14(6), 926–939 (1998)
Baumann, M., Büning, H.K.: Learning shepherding behavior. Ph.D. Thesis, University of Paderborn (2016)
Baxter, J.L., Burke, E., Garibaldi, J.M., Norman, M.: Multi-robot search and rescue: A potential field based approach. In: Autonomous Robots and Agents, pp. 9–16. Springer, Berlin (2007)
Beard, R.W., Lawton, J., Hadaegh, F.Y.: A coordination architecture for spacecraft formation control. IEEE Trans. Control Syst. Technol. 9(6), 777–790 (2001)
Bentivegna, D.C., Atkeson, C.G., Cheng, G.: Learning tasks from observation and practice. Rob. Auton. Syst. 47(2–3), 163–169 (2004)
Billard, A.G., Calinon, S., Dillmann, R.: Learning from humans. In: Springer Handbook of Robotics, pp. 1995–2014. Springer, Berlin (2016)
Billing, E.A., Hellström, T.: A formalism for learning from demonstration. Paladyn 1(1), 1–13 (2010)
Bojarski, M., Del Testa, D., Dworakowski, D., Firner, B., Flepp, B., Goyal, P., Jackel, L.D., Monfort, M., Muller, U., Zhang, J., et al.: End to end learning for self-driving cars (2016). Preprint arXiv:1604.07316
Bojarski, M., Yeres, P., Choromanska, A., Choromanski, K., Firner, B., Jackel, L., Muller, U.: Explaining how a deep neural network trained with end-to-end learning steers a car (2017). Preprint arXiv:1704.07911
Carelli, R., De la Cruz, C., Roberti, F.: Centralized formation control of non-holonomic mobile robots. Lat. Am. Appl. Res. 36(2), 63–69 (2006)
Carrio, A., Sampedro, C., Rodriguez-Ramos, A., Campoy, P.: A review of deep learning methods and applications for unmanned aerial vehicles. J. Sensors 2017, 3296874 (2017)
Chaimowicz, L., Kumar, V.: Aerial shepherds: Coordination among UAVS and swarms of robots. In: Alami, R., Chatila, R., Asama, H. (eds.) Distributed Autonomous Robotic Systems, vol. 6, pp. 243–252. Springer Japan, Tokyo (2007)
Chen, J., Zhang, X., Xin, B., Fang, H.: Coordination between unmanned aerial and ground vehicles: A taxonomy and optimization perspective. IEEE Trans. Cybern. 46(4), 959–972 (2016)
Chollet, F.: Keras: Theano-based deep learning library. Code: https://github.com/fchollet. Documentation: http://keras. IO (2015)
ClearpathRobotics: ROS husky robot. ROS package at http://wiki.ros.org/Robots/ Husky (2017)
Daniel, C., Neumann, G., Peters, J.: Learning concurrent motor skills in versatile solution spaces. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 3591–3597. IEEE, Piscataway (2012)
Daw, n.d., Niv, Y., Dayan, P.: Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat. Neurosci. 8(12), 1704–1711 (2005)
Dillmann, R.: Teaching and learning of robot tasks via observation of human performance. Rob. Auton. Syst. 47(2–3), 109–116 (2004)
Duan, H., Li, P.: Bio-Inspired Computation in Unmanned Aerial Vehicles. Springer, Berlin (2014)
Dunk, I., Abbass, H.: Emergence of order in leader-follower boids-inspired systems. In: 2016 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1–8. IEEE, Piscataway (2016)
Farinelli, A., Iocchi, L., Nardi, D.: Multirobot systems: a classification focused on coordination. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 34(5), 2015–2028 (2004)
Fernandez-Rojas, R., Perry, A., Singh, H., Campbell, B., Elsayed, S., Hunjet, R., Abbass, H.A.: Contextual awareness in human-advanced-vehicle systems: A survey. IEEE Access 7, 33304–33328 (2019)
Fraser, B., Hunjet, R.: Data ferrying in tactical networks using swarm intelligence and stigmergic coordination. In: 2016 26th International Telecommunication Networks and Applications Conference (ITNAC), pp. 1–6. IEEE, Piscataway (2016)
Gee, A., Abbass, H.: Transparent machine education of neural networks for swarm shepherding using curriculum design. In: Proceedings of the International Joint Conference on Neural Networks (2019)
Glavic, M., Fonteneau, R., Ernst, D.: Reinforcement learning for electric power system decision and control: Past considerations and perspectives. IFAC-PapersOnLine 50(1), 6918–6927 (2017)
Graves, A., Mohamed, A.r., Hinton, G.: Speech recognition with deep recurrent neural networks. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6645–6649. IEEE, Piscataway (2013)
Grollman, D.H., Jenkins, O.C.: Incremental learning of subtasks from unsegmented demonstration. In: 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 261–266. IEEE, Piscataway (2010)
Grounds, M., Kudenko, D.: Parallel reinforcement learning with linear function approximation. In: Adaptive Agents and Multi-Agent Systems III. Adaptation and Multi-Agent Learning, pp. 60–74. Springer, Berlin (2008)
Guenter, F., Hersch, M., Calinon, S., Billard, A.: Reinforcement learning for imitating constrained reaching movements. Adv. Rob. 21(13), 1521–1544 (2007)
Guillet, A., Lenain, R., Thuilot, B., Rousseau, V.: Formation control of agricultural mobile robots: A bidirectional weighted constraints approach. J. Field Rob. 34, 1260–1274 (2017)
Guo, X., Denman, S., Fookes, C., Mejias, L., Sridharan, S.: Automatic UAV forced landing site detection using machine learning. In: 2014 International Conference on Digital Image Computing: Techniques and Applications (DICTA), pp. 1–7. IEEE, Piscataway (2014)
Gupta, J.K., Egorov, M., Kochenderfer, M.: Cooperative multi-agent control using deep reinforcement learning. In: International Conference on Autonomous Agents and Multiagent Systems, pp. 66–83. Springer, Berlin (2017)
Howitt, S., Richards, D.: The human machine interface for airborne control of UAVS. In: 2nd AIAA “Unmanned Unlimited” Conference and Workshop & Exhibit, p. 6593 (2003)
Huang, H., Sturm, J.: Tum simulator. ROS package at http://wiki.ros.org/tum_simulator (2014)
Hudjakov, R., Tamre, M.: Aerial imagery terrain classification for long-range autonomous navigation. In: 2009 International Symposium on Optomechatronic Technologies, pp. 88–91. IEEE, Piscataway (2009)
Hunjet, R., Stevens, T., Elliot, M., Fraser, B., George, P.: Survivable communications and autonomous delivery service a generic swarming framework enabling communications in contested environments. In: MILCOM 2017–2017 IEEE Military Communications Conference (MILCOM), pp. 788–793. IEEE, Piscataway (2017)
Hussein, A., Gaber, M.M., Elyan, E., Jayne, C.: Imitation learning: A survey of learning methods. ACM Comput. Surv. (CSUR) 50(2), 21 (2017)
Hwang, Y.K., Choi, K.J., Hong, D.S.: Self-learning control of cooperative motion for a humanoid robot. In: Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006, pp. 475–480. IEEE, Piscataway (2006)
Iima, H., Kuroe, Y.: Swarm reinforcement learning method for a multi-robot formation problem. In: 2013 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 2298–2303. IEEE, Piscataway (2013)
Jansen, B., Belpaeme, T.: A computational model of intention reading in imitation. Rob. Auton. Syst. 54(5), 394–402 (2006)
Justesen, N., Risi, S.: Learning macromanagement in starcraft from replays using deep learning. In: 2017 IEEE Conference on Computational Intelligence and Games (CIG), pp. 162–169. IEEE, Piscataway (2017)
Khaleghi, A.M., Xu, D., Minaeian, S., Li, M., Yuan, Y., Liu, J., Son, Y.J., Vo, C., Lien, J.M.: A dddams-based UAV and UGV team formation approach for surveillance and crowd control. In: Proceedings of the 2014 Winter Simulation Conference, pp. 2907–2918. IEEE Press, Piscataway (2014)
Khaleghi, A.M., Xu, D., Minaeian, S., Li, M., Yuan, Y., Liu, J., Son, Y.J., Vo, C., Mousavian, A., Lien, J.M.: A comparative study of control architectures in UAV/UGV-based surveillance system. In: IIE Annual Conference. Proceedings. Institute of Industrial and Systems Engineers (IISE), p. 3455 (2014)
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization (2014). Preprint arXiv:1412.6980
Kober, J., Peters, J.R.: Policy search for motor primitives in robotics. In: Advances in Neural Information Processing Systems, pp. 849–856 (2009)
Koenig, N., Howard, A.: Gazebo-3d multiple robot simulator with dynamics (2006)
Kolling, A., Walker, P., Chakraborty, N., Sycara, K., Lewis, M.: Human interaction with robot swarms: A survey. IEEE Trans. Human-Mach. Syst. 46(1), 9–26 (2015)
Konidaris, G., Osentoski, S., Thomas, P.S.: Value function approximation in reinforcement learning using the fourier basis. In: Association for the Advancement of Artificial Intelligence, vol. 6, p. 7 (2011)
Kormushev, P., Calinon, S., Caldwell, D.G.: Robot motor skill coordination with em-based reinforcement learning. In: 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 3232–3237. IEEE, Piscataway (2010)
Kormushev, P., Calinon, S., Saegusa, R., Metta, G.: Learning the skill of archery by a humanoid robot ICub. In: 2010 10th IEEE-RAS International Conference on Humanoid Robots, pp. 417–423. IEEE, Piscataway (2010)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Kulić, D., Ott, C., Lee, D., Ishikawa, J., Nakamura, Y.: Incremental learning of full body motion primitives and their sequencing through human motion observation. Int. J. Rob. Res. 31(3), 330–345 (2012)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015)
Levine, S., Finn, C., Darrell, T., Abbeel, P.: End-to-end training of deep visuomotor policies. J. Mach. Learn. Res. 17(1), 1334–1373 (2016)
Li, J., Monroe, W., Ritter, A., Galley, M., Gao, J., Jurafsky, D.: Deep reinforcement learning for dialogue generation (2016). Preprint arXiv:1606.01541
Li, X., Chen, Y.N., Li, L., Gao, J., Celikyilmaz, A.: End-to-end task-completion neural dialogue systems (2017). Preprint arXiv:1703.01008
Lin, S., Garratt, M.A., Lambert, A.J.: Monocular vision-based real-time target recognition and tracking for autonomously landing an uav in a cluttered shipboard environment. Autonom. Rob. 41(4), 881–901 (2017)
Liu, M., Amato, C., Anesta, E.P., Griffith, J.D., How, J.P.: Learning for decentralized control of multiagent systems in large, partially-observable stochastic environments. In: Thirtieth AAAI Conference on Artificial Intelligence (2016)
Long, N.K., Sammut, K., Sgarioto, D., Garratt, M., Abbass, H.A.: A comprehensive review of shepherding as a bio-inspired swarm-robotics guidance approach. IEEE Trans. Emer. Topics Comput. Intell. 4, 523–537 (2020)
Mangin, O., Oudeyer, P.Y.: Unsupervised learning of simultaneous motor primitives through imitation. In: Frontiers in Computational Neuroscience Conference Abstract: IEEE ICDL-EPIROB 2011 (2011)
Martinez, S., Cortes, J., Bullo, F.: Motion coordination with distributed information. IEEE Control Syst. Mag. 27(4), 75–88 (2007)
Mirowski, P., Pascanu, R., Viola, F., Soyer, H., Ballard, A.J., Banino, A., Denil, M., Goroshin, R., Sifre, L., Kavukcuoglu, K., et al.: Learning to navigate in complex environments (2016). Preprint arXiv:1611.03673
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)
Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., Kavukcuoglu, K.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1928–1937 (2016)
Mülling, K., Kober, J., Kroemer, O., Peters, J.: Learning to select and generalize striking movements in robot table tennis. Int. J. Rob. Res. 32(3), 263–279 (2013)
Nguyen, H.T., Garratt, M., Bui, L.T., Abbass, H.: Supervised deep actor network for imitation learning in a ground-air UAV-UGVs coordination task. In: 2017 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1–8. IEEE, Piscataway (2017)
Nguyen, H., Garratt, M., Abbass, H.: Apprenticeship bootstrapping. In: 2018 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE, Piscataway (2018)
Nguyen, H., Tran, V., Nguyen, T., Garratt, M., Kasmarik, K., Barlow, M., Anavatti, S., Abbass, H.: Apprenticeship bootstrapping via deep learning with a safety net for UAV-UGV interaction (2018). Preprint arXiv:1810.04344
Nguyen, T., Nguyen, H., Debie, E., Kasmarik, K., Garratt, M., Abbass, H.: Swarm Q-Leaming with knowledge sharing within environments for formation control. In: 2018 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE, Piscataway (2018)
Nguyen, H.T., Garratt, M., Bui, L.T., Abbass, H.: Apprenticeship learning for continuous state spaces and actions in a swarm-guidance shepherding task. In: 2019 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 102–109. IEEE, Piscataway (2019)
Nguyen, H.T., Nguyen, T.D., Garratt, M., Kasmarik, K., Anavatti, S., Barlow, M., Abbass, H.A.: A deep hierarchical reinforcement learner for aerial shepherding of ground swarms. In: International Conference on Neural Information Processing, pp. 658–669. Springer, Berlin (2019)
Niekum, S., Osentoski, S., Konidaris, G., Barto, A.G.: Learning and generalization of complex tasks from unstructured demonstrations. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5239–5246. IEEE, Piscataway (2012)
Oh, K.K., Park, M.C., Ahn, H.S.: A survey of multi-agent formation control. Automatica 53, 424–440 (2015)
Oh, H., Shirazi, A.R., Sun, C., Jin, Y.: Bio-inspired self-organising multi-robot pattern formation: a review. Rob. Auton. Syst. 91, 83–100 (2017)
Palmer, G., Tuyls, K., Bloembergen, D., Savani, R.: Lenient multi-agent deep reinforcement learning. In: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, pp. 443–451. International Foundation for Autonomous Agents and Multiagent Systems (2018)
Parker, L.: Multiple Mobile Robot Systems, pp. 921–941. Springer, Berlin (2008). https://doi.org/10.1007/978-3-540-30301-5_41
Pastor, P., Kalakrishnan, M., Chitta, S., Theodorou, E., Schaal, S.: Skill learning and task outcome prediction for manipulation. In: 2011 IEEE International Conference on Robotics and Automation, pp. 3828–3834. IEEE, Piscataway (2011)
Pendleton, S.D., Andersen, H., Du, X., Shen, X., Meghjani, M., Eng, Y.H., Rus, D., Ang, M.H.: Perception, planning, control, and coordination for autonomous vehicles. Machines 5(1), 6 (2017)
Ramachandran, D., Amir, E.: Bayesian inverse reinforcement learning. In: International Joint Conferences on Artificial Intelligence, vol. 7, pp. 2586–2591 (2007)
Ross, S., Melik-Barkhudarov, N., Shankar, K.S., Wendel, A., Dey, D., Bagnell, J.A., Hebert, M.: Learning monocular reactive uav control in cluttered natural environments. In: 2013 IEEE International Conference on Robotics and Automation, pp. 1765–1772. IEEE, Piscataway (2013)
Sen, A., Sahoo, S.R., Kothari, M.: Cooperative formation control strategy in heterogeneous network with bounded acceleration. In: 2017 Indian Control Conference (ICC), pp. 344–349. IEEE, Piscataway (2017)
Skoglund, A., Iliev, B., Kadmiry, B., Palm, R.: Programming by demonstration of pick-and-place tasks for industrial manipulators using task primitives. In: 2007 International Symposium on Computational Intelligence in Robotics and Automation, pp. 368–373. IEEE, Piscataway (2007)
Song, J., Ren, H., Sadigh, D., Ermon, S.: Multi-agent generative adversarial imitation learning. In: Advances in Neural Information Processing Systems, pp. 7461–7472 (2018)
Speck, C., Bucci, D.J.: Distributed UAV swarm formation control via object-focused, multi-objective SARSA. In: 2018 Annual American Control Conference (ACC), pp. 6596–6601. IEEE, Piscataway (2018)
Strömbom, D., Mann, R.P., Wilson, A.M., Hailes, S., Morton, A.J., Sumpter, D.J.T., King, A.J.: Solving the shepherding problem: heuristics for herding autonomous, interacting agents. J. R. Soc. Interf. 11(100) (2014). https://browzine.com/articles/52614503
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, vol. 1. MIT Press, Cambridge (1998)
Szepesvári, C.: Algorithms for reinforcement learning. Synth. Lect. Artif. Intell. Mach. Learn. 4(1), 1–103 (2010)
Trentini, M., Beckman, B.: Semi-autonomous UAV/UGV for dismounted urban operations. In: Unmanned Systems Technology XII, vol. 7692, p. 76921C. International Society for Optics and Photonics (2010)
Tsitsiklis, J.N., Van Roy, B.: Analysis of temporal-diffference learning with function approximation. In: Advances in Neural Information Processing Systems, pp. 1075–1081 (1997)
Vidal, R., Rashid, S., Sharp, C., Shakernia, O., Kim, J., Sastry, S.: Pursuit-evasion games with unmanned ground and aerial vehicles. In: Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No. 01CH37164), vol. 3, pp. 2948–2955. IEEE, Piscataway (2001)
Waslander, S.L.: Unmanned aerial and ground vehicle teams: Recent work and open problems. In: Autonomous Control Systems and Vehicles, pp. 21–36. Springer, Berlin (2013)
Wulfmeier, M., Ondruska, P., Posner, I.: Maximum entropy deep inverse reinforcement learning (2015). Preprint arXiv:1507.04888
Xu, D., Zhang, X., Zhu, Z., Chen, C., Yang, P.: Behavior-based formation control of swarm robots. Math. Problems Eng. 2014, 205759 (2014)
Yang, Z., Merrick, K., Jin, L., Abbass, H.A.: Hierarchical deep reinforcement learning for continuous action control. IEEE Trans. Neur. Netw. Learn. Syst. (99), 1–11 (2018)
Yoshikai, T., Otake, N., Mizuuchi, I., Inaba, M., Inoue, H.: Development of an imitation behavior in humanoid kenta with reinforcement learning algorithm based on the attention during imitation. In: 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No. 04CH37566), vol. 2, pp. 1192–1197. IEEE, Piscataway (2004)
You, C., Lu, J., Filev, D., Tsiotras, P.: Advanced planning for autonomous vehicles using reinforcement learning and deep inverse reinforcement learning. Rob. Auton. Syst. 114, 1–18 (2019)
Yu, H., Beard, R.W., Argyle, M., Chamberlain, C.: Probabilistic path planning for cooperative target tracking using aerial and ground vehicles. In: Proceedings of the 2011 American Control Conference, pp. 4673–4678. IEEE, Piscataway (2011)
Zhang, T., Kahn, G., Levine, S., Abbeel, P.: Learning deep control policies for autonomous aerial vehicles with mpc-guided policy search. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 528–535. IEEE, Piscataway (2016)
Zhang, T., Li, Q., Zhang, C.s., Liang, H.w., Li, P., Wang, T.m., Li, S., Zhu, Y.l., Wu, C.: Current trends in the development of intelligent unmanned autonomous systems. Front. Inf. Technol. Electron. Eng. 18(1), 68–85 (2017)
Zhan, E., Zheng, S., Yue, Y., Sha, L., Lucey, P.: Generative multi-agent behavioral cloning. In: Proceedings of the 35th International Conference on Machine Learning (2018)
Ziebart, B.D., Maas, A.L., Bagnell, J.A., Dey, A.K.: Maximum entropy inverse reinforcement learning. In: Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence, vol. 8, pp. 1433–1438. Chicago (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic Supplementary Material
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Nguyen, H., Garratt, M., Abbass, H.A. (2021). Apprenticeship Bootstrapping Reinforcement Learning for Sky Shepherding of a Ground Swarm in Gazebo. In: Abbass, H.A., Hunjet, R.A. (eds) Shepherding UxVs for Human-Swarm Teaming. Unmanned System Technologies. Springer, Cham. https://doi.org/10.1007/978-3-030-60898-9_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-60898-9_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60897-2
Online ISBN: 978-3-030-60898-9
eBook Packages: EngineeringEngineering (R0)