Reinforcement learning as a rehearsal for swarm foraging

Nguyen, Trung; Banerjee, Bikramjit

doi:10.1007/s11721-021-00203-8

Reinforcement learning as a rehearsal for swarm foraging

Published: 29 September 2021

Volume 16, pages 29–58, (2022)
Cite this article

Swarm Intelligence Aims and scope Submit manuscript

797 Accesses
1 Citation
3 Altmetric
Explore all metrics

Abstract

Foraging in a swarm of robots has been investigated by many researchers, where the prevalent techniques have been hand-designed algorithms with parameters often tuned via machine learning. Our departure point is one such algorithm, where we replace a hand-coded decision procedure with reinforcement learning (RL), resulting in significantly superior performance. We situate our approach within the reinforcement learning as a rehearsal (RLaR) framework, that we have recently introduced. We instantiate RLaR for the foraging problem and experimentally show that a key component of RLaR—a conditional probability distribution function—can be modeled as a uni-modal distribution (with a lower memory footprint) despite evidence that it is multi-modal. Our experiments also show that the learned behavior has some degree of scalability in terms of variations in the swarm size or the environment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A hierarchical training method of generating collective foraging behavior for a robotic swarm

Article 26 November 2021

Boyin Jin, Yupeng Liang, … Kazuhiro Ohkura

Distributed Colony-Level Algorithm Switching for Robot Swarm Foraging

Autonomous foraging with a pack of robots based on repulsion, attraction and influence

Article 06 July 2021

Erick Ordaz-Rivas, Angel Rodriguez-Liñan & Luis Torres-Treviño

Notes

The \(\epsilon \)-greedy strategy is to select a (greedy) action according to the learned Q-value function with a high probability, \((1-\epsilon )\). With the remaining small probability, \(\epsilon \), the agent selects a random action in order to explore the environment.

References

Bayindir, L. (2016). A review of swarm robotics tasks. Neurocomputing, 172, 292–321. https://doi.org/10.1016/j.neucom.2015.05.116.
Article Google Scholar
Biancalani, T., Dyson, L., & McKane, A. J. (2014). Noise-induced bistable states and their mean switching time in foraging colonies. Physical Review Letters, 112(3), 038101.
Article Google Scholar
Bishop, C. M. (2007). Pattern recognition and machine learning (information science and statistics). Springer.
Google Scholar
Brambilla, M., Ferrante, E., Birattari, M., & Dorigo, M. (2013). Swarm robotics: A review from the swarm engineering perspective. Swarm Intelligence, 7(1), 1–41.
Article Google Scholar
Claus, C., & Boutilier, C. (1998). The dynamics of reinforcement learning in cooperative multiagent systems. In Proceedings of the 15th national conference on artificial intelligence (pp. 746–752). AAAI Press/MIT Press.
Correll, N., & Martinoli, A. (2006). Collective inspection of regular structures using a swarm of miniature robots. In 9th int. symp. on experimental robotics (ISER) (pp. 375–385). Springer Tracts in Advanced Robotics.
Czaczkes, T. J., Grüter, C., Ellis, L., Wood, E., & Ratnieks, F. L. (2013). Ant foraging on complex trails: Route learning and the role of trail pheromones in Lasius niger. Journal of Experimental Biology, 216(2), 188–197.
Google Scholar
Dimidov, C., Oriolo, G., & Trianni, V. (2016). Random walks in swarm robotics: An experiment with kilobots. In International conference on swarm intelligence (pp. 185–196). Springer.
Dorigo, M., Floreano, D., Gambardella, L. M., Mondada, F., Nolfi, S., Baaboura, T., et al. (2013). Swarmanoid: A novel concept for the study of heterogeneous robotic swarms. IEEE Robotics & Automation Magazine, 20(4), 60–71.
Article Google Scholar
Dorigo, M., Theraulaz, G., & Trianni, V. (2021). Swarm robotics: Past, present, and future. Proceedings of the IEEE, 109(7), 1152–1165. https://doi.org/10.1109/JPROC.2021.3072740.
Article Google Scholar
Essche, S. V., Ferrante, E., Turgut, A. E., Lon, R. V., Holvoet, T., & Wenseleers, T. (2015). Environmental factors promoting the evolution of recruitment strategies in swarms of foraging robots. Proceedings of the First International Symposium on Swarm Behavior and Bio-Inspired Robotics, 7, 607–613.
Google Scholar
Ferrante, E., Turgut, A. E., Duéñez-Guzmán, E., Dorigo, M., & Wenseleers, T. (2015). Evolution of self-organized task specialization in robot swarms. PLoS Computational Biology, 11(8), e1004273.
Article Google Scholar
Fujisawa, R., Dobata, S., Sugawara, K., & Matsuno, F. (2014). Designing pheromone communication in swarm robotics: Group foraging behavior mediated by chemical substance. Swarm Intelligence, 8(3), 227–246.
Article Google Scholar
Gardiner, C. W. (1985). Handbook of stochastic methods (Vol. 3). Springer.
Google Scholar
Goldberg, D., & Mataric, M. J. (1997). Interference as a tool for designing and evaluating multi-robot controllers. In AAAI/IAAI (pp. 637–642).
Goldberg, D., & Mataric, M. J. (2000). Robust behavior-based control for distributed multi-robot collection tasks. In Technical report IRIS-00-387. USC Institute for Robotics and Intelligent Systems.
Goss, S., & Deneubourg, J. L. (1992). Harvesting by a group of robots. In First European conference on artificial life (pp. 195–204).
Hamann, H. (2018). Superlinear scalability in parallel computing and multi-robot systems: Shared resources, collaboration, and network topology. In International conference on architecture of computing systems (pp. 31–42). Springer.
Hecker, J. P., & Moses, M. E. (2013). An evolutionary approach for robust adaptation of robot behavior to sensor error. In Proceeding of the fifteenth annual conference companion on genetic and evolutionary computation conference companion—GECCO ’13 companion. https://doi.org/10.1145/2464576.2482724
Hecker, J. P., & Moses, M. E. (2015). Beyond pheromones: Evolving error-tolerant, flexible, and scalable ant-inspired robot swarms. Swarm Intelligence, 9(1), 43–70.
Google Scholar
Hoff, N., Sagoff, A., Wood, R., & Nagpal, R. (2010). Two foraging algorithms for robot swarms using only local communication. In Proc. IEEE international conference on robotics and biomimetics.
Kengyel, D., Hamann, H., Zahadat, P., Radspieler, G., Wotawa, F., & Schmickl, T. (2015). Potential of heterogeneity in collective behaviors: A case study on heterogeneous swarms. In International conference on principles and practice of multi-agent systems (pp. 201–217). Springer.
Khaluf, Y., Pinciroli, C., Valentini, G., & Hamann, H. (2017). The impact of agent density on scalability in collective systems: Noise-induced versus majority-based bistability. Swarm Intelligence, 11(2), 155–179.
Article Google Scholar
Kraemer, L., & Banerjee, B. (2016). Multi-agent reinforcement learning as a rehearsal for decentralized planning. Neurocomputing, 190, 82–94.
Article Google Scholar
Kumar, V., & Sahin, F. (2003). Cognitive maps in swarm robots for the mine detection application. In Proc. IEEE international conference on systems, man and cybernetics (Vol. 4, pp. 3364–3369).
Labella, T. H., Dorigo, M., & Deneubourg, J. L. (2006). Division of labor in a group of robots inspired by ants’ foraging behavior. ACM Transactions on Autonomous and Adaptive Systems (TAAS), 1(1), 4–25.
Article Google Scholar
Letendre, K., & Moses, M. E. (2013). Synergy in ant foraging strategies. In Proceeding of the fifteenth annual conference on genetic and evolutionary computation conference—GECCO ’13. https://doi.org/10.1145/2463372.2463389
Liemhetcharat, S., Yan, R., & Tee, K. P. (2015). Continuous foraging and information gathering in a multi-agent team. In Proceedings of the 2015 international conference on autonomous agents and multiagent systems (pp. 1325–1333).
Llenas, A. F., Talamali, M. S., Xu, X., Marshall, J. A., & Reina, A. (2018). Quality-sensitive foraging by a robot swarm through virtual pheromone trails. In International conference on swarm intelligence (pp. 135–149). Springer.
Lu, Q., Moses, M. E., & Hecker, J. P. (2016). A scalable and adaptable multiple-place foraging algorithm for ant-inspired robot swarms. In Robotics: Science and systems conference (RSS 2016) workshop.
Mayya, S., Pierpaoli, P., & Egerstedt, M. (2019). Voluntary retreat for decentralized interference reduction in robot swarms. In 2019 international conference on robotics and automation (ICRA) (pp. 9667–9673). IEEE.
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., et al. (2015). Human-level control through deep reinforcement learning. Nature, 518, 529–533.
Article Google Scholar
Pérez, I. F., Boumaza, A., & Charpillet, F. (2017). Learning collaborative foraging in a swarm of robots using embodied evolution. In Artificial life conference proceedings 14 (pp. 162–161). MIT Press.
Pinciroli, C., Trianni, V., O’Grady, R., Pini, G., Brutschy, A., Brambilla, M., et al. (2012). Argos: A modular, parallel, multi-engine simulator for multi-robot systems. Swarm Intelligence, 6(4), 271–295.
Article Google Scholar
Pitonakova, L., Crowder, R., & Bullock, S. (2016). Information flow principles for plasticity in foraging robot swarms. Swarm Intelligence, 10(1), 33–63.
Article Google Scholar
Pitonakova, L., Crowder, R., & Bullock, S. (2018). The information-cost-reward framework for understanding robot swarm foraging. Swarm Intelligence, 12(1), 71–96.
Article Google Scholar
Poissonnier, L. A., Motsch, S., Gautrais, J., Buhl, J., & Dussutour, A. (2019). Experimental investigation of ant traffic under crowded conditions. Elife, 8, e48945.
Article Google Scholar
Reina, A., Miletitch, R., Dorigo, M., & Trianni, V. (2015). A quantitative micro-macro link for collective decisions: The shortest path discovery/selection example. Swarm Intelligence, 9(2), 75–102.
Article Google Scholar
Riedmiller, M. (2005). Neural fitted Q iteration: First experiences with a data efficient neural reinforcement learning method. In Proceedings of European conference on machine learning (pp. 317–328). Springer.
Saha, H. N., Das, N. K., Pal, S. K., Basu, S., Auddy, S., Dey, R., Nandy, A., Pal, D., Roy, N., Mitra, D., & Biswas, S. (2018). A cloud based autonomous multipurpose system with self-communicating bots and swarm of drones. In 2018 IEEE 8th annual computing and communication workshop and conference (CCWC) (pp. 649–653). IEEE.
Şahin, E., Girgin, S., Bayindir, L., & Turgut, A. E. (2008). Swarm robotics. In Swarm intelligence (pp. 87–100). Springer.
Schaul, T., Quan, J., Antonoglou, I., & Silver, D. (2015). Prioritized experience replay. arXiv preprint arXiv:1511.05952
Simonin, O., Charpillet, F., & Thierry, E. (2014). Revisiting wavefront construction with collective agents: An approach to foraging. Swarm Intelligence, 8(2), 113–138. https://doi.org/10.1007/s11721-014-0093-3.
Article Google Scholar
Sutton, R., & Barto, A. G. (1998). Reinforcement learning: An introduction. MIT Press.
MATH Google Scholar
Talamali, M. S., Bose, T., Haire, M., Xu, X., Marshall, J. A., & Reina, A. (2020). Sophisticated collective foraging with minimalist agents: A swarm robotics test. Swarm Intelligence, 14(1), 25–56.
Article Google Scholar
Van Hasselt, H., Guez, A., & Silver, D. (2016). Deep reinforcement learning with double q-learning. In Proceedings of the thirtieth AAAI conference on artificial intelligence.
Wang, Z., Schaul, T,, Hessel, M., Van Hasselt, H., Lanctot, M., & De Freitas, N. (2015). Dueling network architectures for deep reinforcement learning. arXiv preprint arXiv:1511.06581
Watkins, C., & Dayan, P. (1992). Q-learning: Technical note. Machine Learning, 8, 279–292.
MATH Google Scholar
Yogeswaran, M., & Ponnambalam, S. G. (2012). Reinforcement learning: Exploration-exploitation dilemma in multi-agent foraging task. Opsearch, 49(3), 223–236. https://doi.org/10.1007/s12597-012-0077-2.
Article MATH Google Scholar

Download references

Acknowledgements

We thank the anonymous reviewers and editors for helpful comments and suggestions. This work was supported in part by National Science Foundation Grant IIS-1526813.

Author information

Authors and Affiliations

Department of Computer Science, Winona State University, 175 West Mark Street, Winona, MN, 55987, USA
Trung Nguyen
School of Computing Sciences and Computer Engineering, The University of Southern Mississippi, 118 College Dr. #5106, Hattiesburg, MS, 39406, USA
Bikramjit Banerjee

Authors

Trung Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Bikramjit Banerjee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bikramjit Banerjee.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nguyen, T., Banerjee, B. Reinforcement learning as a rehearsal for swarm foraging. Swarm Intell 16, 29–58 (2022). https://doi.org/10.1007/s11721-021-00203-8

Download citation

Received: 05 September 2020
Accepted: 13 September 2021
Published: 29 September 2021
Issue Date: March 2022
DOI: https://doi.org/10.1007/s11721-021-00203-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Reinforcement learning as a rehearsal for swarm foraging

Abstract

Access this article

Similar content being viewed by others

A hierarchical training method of generating collective foraging behavior for a robotic swarm

Distributed Colony-Level Algorithm Switching for Robot Swarm Foraging

Autonomous foraging with a pack of robots based on repulsion, attraction and influence

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Reinforcement learning as a rehearsal for swarm foraging

Abstract

Access this article

Similar content being viewed by others

A hierarchical training method of generating collective foraging behavior for a robotic swarm

Distributed Colony-Level Algorithm Switching for Robot Swarm Foraging

Autonomous foraging with a pack of robots based on repulsion, attraction and influence

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation