Skip to main content
Log in

Reinforcement learning as a rehearsal for swarm foraging

  • Published:
Swarm Intelligence Aims and scope Submit manuscript

Abstract

Foraging in a swarm of robots has been investigated by many researchers, where the prevalent techniques have been hand-designed algorithms with parameters often tuned via machine learning. Our departure point is one such algorithm, where we replace a hand-coded decision procedure with reinforcement learning (RL), resulting in significantly superior performance. We situate our approach within the reinforcement learning as a rehearsal (RLaR) framework, that we have recently introduced. We instantiate RLaR for the foraging problem and experimentally show that a key component of RLaR—a conditional probability distribution function—can be modeled as a uni-modal distribution (with a lower memory footprint) despite evidence that it is multi-modal. Our experiments also show that the learned behavior has some degree of scalability in terms of variations in the swarm size or the environment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. The \(\epsilon \)-greedy strategy is to select a (greedy) action according to the learned Q-value function with a high probability, \((1-\epsilon )\). With the remaining small probability, \(\epsilon \), the agent selects a random action in order to explore the environment.

References

  • Bayindir, L. (2016). A review of swarm robotics tasks. Neurocomputing, 172, 292–321. https://doi.org/10.1016/j.neucom.2015.05.116.

    Article  Google Scholar 

  • Biancalani, T., Dyson, L., & McKane, A. J. (2014). Noise-induced bistable states and their mean switching time in foraging colonies. Physical Review Letters, 112(3), 038101.

    Article  Google Scholar 

  • Bishop, C. M. (2007). Pattern recognition and machine learning (information science and statistics). Springer.

    Google Scholar 

  • Brambilla, M., Ferrante, E., Birattari, M., & Dorigo, M. (2013). Swarm robotics: A review from the swarm engineering perspective. Swarm Intelligence, 7(1), 1–41.

    Article  Google Scholar 

  • Claus, C., & Boutilier, C. (1998). The dynamics of reinforcement learning in cooperative multiagent systems. In Proceedings of the 15th national conference on artificial intelligence (pp. 746–752). AAAI Press/MIT Press.

  • Correll, N., & Martinoli, A. (2006). Collective inspection of regular structures using a swarm of miniature robots. In 9th int. symp. on experimental robotics (ISER) (pp. 375–385). Springer Tracts in Advanced Robotics.

  • Czaczkes, T. J., Grüter, C., Ellis, L., Wood, E., & Ratnieks, F. L. (2013). Ant foraging on complex trails: Route learning and the role of trail pheromones in Lasius niger. Journal of Experimental Biology, 216(2), 188–197.

    Google Scholar 

  • Dimidov, C., Oriolo, G., & Trianni, V. (2016). Random walks in swarm robotics: An experiment with kilobots. In International conference on swarm intelligence (pp. 185–196). Springer.

  • Dorigo, M., Floreano, D., Gambardella, L. M., Mondada, F., Nolfi, S., Baaboura, T., et al. (2013). Swarmanoid: A novel concept for the study of heterogeneous robotic swarms. IEEE Robotics & Automation Magazine, 20(4), 60–71.

    Article  Google Scholar 

  • Dorigo, M., Theraulaz, G., & Trianni, V. (2021). Swarm robotics: Past, present, and future. Proceedings of the IEEE, 109(7), 1152–1165. https://doi.org/10.1109/JPROC.2021.3072740.

    Article  Google Scholar 

  • Essche, S. V., Ferrante, E., Turgut, A. E., Lon, R. V., Holvoet, T., & Wenseleers, T. (2015). Environmental factors promoting the evolution of recruitment strategies in swarms of foraging robots. Proceedings of the First International Symposium on Swarm Behavior and Bio-Inspired Robotics, 7, 607–613.

    Google Scholar 

  • Ferrante, E., Turgut, A. E., Duéñez-Guzmán, E., Dorigo, M., & Wenseleers, T. (2015). Evolution of self-organized task specialization in robot swarms. PLoS Computational Biology, 11(8), e1004273.

    Article  Google Scholar 

  • Fujisawa, R., Dobata, S., Sugawara, K., & Matsuno, F. (2014). Designing pheromone communication in swarm robotics: Group foraging behavior mediated by chemical substance. Swarm Intelligence, 8(3), 227–246.

    Article  Google Scholar 

  • Gardiner, C. W. (1985). Handbook of stochastic methods (Vol. 3). Springer.

    Google Scholar 

  • Goldberg, D., & Mataric, M. J. (1997). Interference as a tool for designing and evaluating multi-robot controllers. In AAAI/IAAI (pp. 637–642).

  • Goldberg, D., & Mataric, M. J. (2000). Robust behavior-based control for distributed multi-robot collection tasks. In Technical report IRIS-00-387. USC Institute for Robotics and Intelligent Systems.

  • Goss, S., & Deneubourg, J. L. (1992). Harvesting by a group of robots. In First European conference on artificial life (pp. 195–204).

  • Hamann, H. (2018). Superlinear scalability in parallel computing and multi-robot systems: Shared resources, collaboration, and network topology. In International conference on architecture of computing systems (pp. 31–42). Springer.

  • Hecker, J. P., & Moses, M. E. (2013). An evolutionary approach for robust adaptation of robot behavior to sensor error. In Proceeding of the fifteenth annual conference companion on genetic and evolutionary computation conference companion—GECCO ’13 companion. https://doi.org/10.1145/2464576.2482724

  • Hecker, J. P., & Moses, M. E. (2015). Beyond pheromones: Evolving error-tolerant, flexible, and scalable ant-inspired robot swarms. Swarm Intelligence, 9(1), 43–70.

    Google Scholar 

  • Hoff, N., Sagoff, A., Wood, R., & Nagpal, R. (2010). Two foraging algorithms for robot swarms using only local communication. In Proc. IEEE international conference on robotics and biomimetics.

  • Kengyel, D., Hamann, H., Zahadat, P., Radspieler, G., Wotawa, F., & Schmickl, T. (2015). Potential of heterogeneity in collective behaviors: A case study on heterogeneous swarms. In International conference on principles and practice of multi-agent systems (pp. 201–217). Springer.

  • Khaluf, Y., Pinciroli, C., Valentini, G., & Hamann, H. (2017). The impact of agent density on scalability in collective systems: Noise-induced versus majority-based bistability. Swarm Intelligence, 11(2), 155–179.

    Article  Google Scholar 

  • Kraemer, L., & Banerjee, B. (2016). Multi-agent reinforcement learning as a rehearsal for decentralized planning. Neurocomputing, 190, 82–94.

    Article  Google Scholar 

  • Kumar, V., & Sahin, F. (2003). Cognitive maps in swarm robots for the mine detection application. In Proc. IEEE international conference on systems, man and cybernetics (Vol. 4, pp. 3364–3369).

  • Labella, T. H., Dorigo, M., & Deneubourg, J. L. (2006). Division of labor in a group of robots inspired by ants’ foraging behavior. ACM Transactions on Autonomous and Adaptive Systems (TAAS), 1(1), 4–25.

    Article  Google Scholar 

  • Letendre, K., & Moses, M. E. (2013). Synergy in ant foraging strategies. In Proceeding of the fifteenth annual conference on genetic and evolutionary computation conference—GECCO ’13. https://doi.org/10.1145/2463372.2463389

  • Liemhetcharat, S., Yan, R., & Tee, K. P. (2015). Continuous foraging and information gathering in a multi-agent team. In Proceedings of the 2015 international conference on autonomous agents and multiagent systems (pp. 1325–1333).

  • Llenas, A. F., Talamali, M. S., Xu, X., Marshall, J. A., & Reina, A. (2018). Quality-sensitive foraging by a robot swarm through virtual pheromone trails. In International conference on swarm intelligence (pp. 135–149). Springer.

  • Lu, Q., Moses, M. E., & Hecker, J. P. (2016). A scalable and adaptable multiple-place foraging algorithm for ant-inspired robot swarms. In Robotics: Science and systems conference (RSS 2016) workshop.

  • Mayya, S., Pierpaoli, P., & Egerstedt, M. (2019). Voluntary retreat for decentralized interference reduction in robot swarms. In 2019 international conference on robotics and automation (ICRA) (pp. 9667–9673). IEEE.

  • Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., et al. (2015). Human-level control through deep reinforcement learning. Nature, 518, 529–533.

    Article  Google Scholar 

  • Pérez, I. F., Boumaza, A., & Charpillet, F. (2017). Learning collaborative foraging in a swarm of robots using embodied evolution. In Artificial life conference proceedings 14 (pp. 162–161). MIT Press.

  • Pinciroli, C., Trianni, V., O’Grady, R., Pini, G., Brutschy, A., Brambilla, M., et al. (2012). Argos: A modular, parallel, multi-engine simulator for multi-robot systems. Swarm Intelligence, 6(4), 271–295.

    Article  Google Scholar 

  • Pitonakova, L., Crowder, R., & Bullock, S. (2016). Information flow principles for plasticity in foraging robot swarms. Swarm Intelligence, 10(1), 33–63.

    Article  Google Scholar 

  • Pitonakova, L., Crowder, R., & Bullock, S. (2018). The information-cost-reward framework for understanding robot swarm foraging. Swarm Intelligence, 12(1), 71–96.

    Article  Google Scholar 

  • Poissonnier, L. A., Motsch, S., Gautrais, J., Buhl, J., & Dussutour, A. (2019). Experimental investigation of ant traffic under crowded conditions. Elife, 8, e48945.

    Article  Google Scholar 

  • Reina, A., Miletitch, R., Dorigo, M., & Trianni, V. (2015). A quantitative micro-macro link for collective decisions: The shortest path discovery/selection example. Swarm Intelligence, 9(2), 75–102.

    Article  Google Scholar 

  • Riedmiller, M. (2005). Neural fitted Q iteration: First experiences with a data efficient neural reinforcement learning method. In Proceedings of European conference on machine learning (pp. 317–328). Springer.

  • Saha, H. N., Das, N. K., Pal, S. K., Basu, S., Auddy, S., Dey, R., Nandy, A., Pal, D., Roy, N., Mitra, D., & Biswas, S. (2018). A cloud based autonomous multipurpose system with self-communicating bots and swarm of drones. In 2018 IEEE 8th annual computing and communication workshop and conference (CCWC) (pp. 649–653). IEEE.

  • Şahin, E., Girgin, S., Bayindir, L., & Turgut, A. E. (2008). Swarm robotics. In Swarm intelligence (pp. 87–100). Springer.

  • Schaul, T., Quan, J., Antonoglou, I., & Silver, D. (2015). Prioritized experience replay. arXiv preprint arXiv:1511.05952

  • Simonin, O., Charpillet, F., & Thierry, E. (2014). Revisiting wavefront construction with collective agents: An approach to foraging. Swarm Intelligence, 8(2), 113–138. https://doi.org/10.1007/s11721-014-0093-3.

    Article  Google Scholar 

  • Sutton, R., & Barto, A. G. (1998). Reinforcement learning: An introduction. MIT Press.

    MATH  Google Scholar 

  • Talamali, M. S., Bose, T., Haire, M., Xu, X., Marshall, J. A., & Reina, A. (2020). Sophisticated collective foraging with minimalist agents: A swarm robotics test. Swarm Intelligence, 14(1), 25–56.

    Article  Google Scholar 

  • Van Hasselt, H., Guez, A., & Silver, D. (2016). Deep reinforcement learning with double q-learning. In Proceedings of the thirtieth AAAI conference on artificial intelligence.

  • Wang, Z., Schaul, T,, Hessel, M., Van Hasselt, H., Lanctot, M., & De Freitas, N. (2015). Dueling network architectures for deep reinforcement learning. arXiv preprint arXiv:1511.06581

  • Watkins, C., & Dayan, P. (1992). Q-learning: Technical note. Machine Learning, 8, 279–292.

    MATH  Google Scholar 

  • Yogeswaran, M., & Ponnambalam, S. G. (2012). Reinforcement learning: Exploration-exploitation dilemma in multi-agent foraging task. Opsearch, 49(3), 223–236. https://doi.org/10.1007/s12597-012-0077-2.

    Article  MATH  Google Scholar 

Download references

Acknowledgements

We thank the anonymous reviewers and editors for helpful comments and suggestions. This work was supported in part by National Science Foundation Grant IIS-1526813.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bikramjit Banerjee.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nguyen, T., Banerjee, B. Reinforcement learning as a rehearsal for swarm foraging. Swarm Intell 16, 29–58 (2022). https://doi.org/10.1007/s11721-021-00203-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11721-021-00203-8

Keywords

Navigation