An Improved Q-Learning Algorithm Using Synthetic Pheromones

  • Ndedi Monekosso
  • Paolo Remagnino
  • Adam Szarowicz
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2296)


In this paper we propose an algorithm for multi-agent Q-learning. The algorithm is inspired by the natural behaviour of ants, which deposit pheromone in the environment to communicate. The benefit besides simulating ant behaviour in a colony is to design complex multi-agent systems. Complex behaviour can emerge from relatively simple interacting agents. The proposed Q-learning update equation includes a belief factor. The belief factor reflects the confidence the agent has in the pheromone detected in its environment. Agents communicate implicitly to co-ordinate and co-operate in learning to solve a problem.


Machine Learning Multi-agents Pheromones Coordination Communication 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    C. Anderson, P.G. Blacwell, and C. Cannings. Simulating ants that forage by expectation. In Proc. 4Th Conf. on Artificial Life, pages 531–538, 1997.Google Scholar
  2. 2.
    R. Beckers, J. L. Deneubourg, S. Goss, and J. M. Pasteels. Collective decision making through food recruitment. Ins. Soc., 37:258–267, 1990.CrossRefGoogle Scholar
  3. 3.
    R. Beckers, J.L. Deneubourg, and S. Goss. Trails and u-turns in the selection of the shortest path by the ant lasius niger. Journal of Theoretical Biology, 159:397–4151, 1992.CrossRefGoogle Scholar
  4. 4.
    D.P. Bertsekas and J.N. Tsitsiklis. Neuro-Dynamic Programming. Athena Scientific, 1996.Google Scholar
  5. 5.
    E. Bonabeau, M. Dorigo, and G. Theraulaz. Swarm intelligence, From Natural to Artificial Systems. Oxford University Press, 1999.Google Scholar
  6. 6.
    M. C. Cammaerts-Tricot. Piste et pheromone attraction chez la fourmi myrmica ruba. Journal of Computational Physiology, 88:373–382, 1974.CrossRefGoogle Scholar
  7. 7.
    G. Di Caro and M. Dorigo. Antnet: a mobile agents approach to adaptive routing.Google Scholar
  8. 8.
    A. Colorni, M. Dorigo, and V. Maniezzo. Ant system for job-shop scheduling. Belgian Journal of OR, statistics and computer science, 34:39–53, 1993.Google Scholar
  9. 9.
    A. Colorni, M. Dorigo, and G. Theraulaz. Distributed optimzation by ant colonies. In Proceedings First European Conf. on Artificial Life, pages 134–142, 1991.Google Scholar
  10. 10.
    J.L. Deneubourg, R. Beckers, and S. Goss. Trails and u-turns in the selection of a path by the ant lasius niger. Journal of Theoretical Biology, 159:397–415, 1992.CrossRefGoogle Scholar
  11. 11.
    J.L. Deneubourg and S. Goss. Collective patterns and decision making. Ethol. Ecol. and Evol., 1:295–311, 1993.Google Scholar
  12. 12.
    M. Dorigo and L. M. Gambardella. Ant colony system: A cooperative learning approach to the travelling salesman problem. IEEE Trans. on Evol. Comp., 1:53–66, 1997.CrossRefGoogle Scholar
  13. 13.
    M. Dorigo, V. Maniezzo, and A. Colorni. The ant system: Optimization by a colony of cooperatin agents. IEEE Trans. on Systems, Man, and Cybernetics, 26:1–13, 1996.Google Scholar
  14. 14.
    M. Kisiel-Dorohinicki E. Nawarecki, G. Dobrowolski. Organisations in the particular class of multi-agent systems. In in this volume, 2001.Google Scholar
  15. 15.
    L. M. Gambardella and M. Dorigo. Ant-q: A reinforcement learning approach to the traveling salesman problem. In Proc. 12Th ICML, pages 252–260, 1995.Google Scholar
  16. 16.
    L. M. Gambardella, E. D. Taillard, and M. Dorigo. Ant colonies for the qap. Journal of Operational Research society, 1998.Google Scholar
  17. 17.
    S. Goss, S. Aron, J.L. Deneubourg, and J. M. Pasteels. Self-organized shorcuts in the argentine ants. Naturwissenschaften, pages 579–581, 1989.Google Scholar
  18. 18.
    L. R. Leerink, S. R. Schultz, and M. A. Jabri. A reinforcement learning exploration strategy based on ant foraging mechanisms. In Proc. 6Th Australian Conference on Neural Nets, 1995.Google Scholar
  19. 19.
    J-P. Sansonnet N. Sabouret. Learning collective behaviour from local interaction. In in this volume, 2001.Google Scholar
  20. 20.
    J.G. Ollason. Learning to forage-optimally? Theoretical Population Biology, 18:44–56, 1980.CrossRefMathSciNetGoogle Scholar
  21. 21.
    J.G. Ollason. Learning to forage in a regenerating patchy environment: can it fail to be optimal? Theoretical Population Biology, 31:13–32, 1987.zbMATHCrossRefGoogle Scholar
  22. 22.
    H. Van Dyke Parunak and S. Brueckner. Ant-like missionnaries and cannibals: Synthetic pheromones for distributed motion control. In Proc. of ICMAS’00, 2000.Google Scholar
  23. 23.
    H. Van Dyke Parunak, S. Brueckner, J. Sauter, and J. Posdamer. Mechanisms and military applications for synthetic pheromones. In Proc. 5Th International Conference Autonomous Agents, Montreal, Canada, 2001.Google Scholar
  24. 24.
    L. Sheremetov R. Romero Cortes. Model of cooperation in multi-agent systems with fuzzy coalitions. In in this volume, 2001.Google Scholar
  25. 25.
    R. S. Sutton and A.G. Barto. Reinforcement Learning. MITPress, 1998.Google Scholar
  26. 26.
    Ming Tan. Multi-agent reinforcement learning: Independent vs. cooperative agents. In Proceedings of the Tenth International Conference on Machine Learning, pages 330–337, 1993.Google Scholar
  27. 27.
    R. T. Vaughan, K. Stoy, G. S. Sukhatme, and M. J. Mataric. Whistling in the dark: Cooperative trail following in uncertain localization space. In Proc. 4Th International Conference on Autonomous Agents, Barcelona, Spain, 2000.Google Scholar
  28. 28.
    C. J. C. H. Watkins. Learning with delayed rewards. PhD thesis, University of Cambridge, 1989.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Ndedi Monekosso
    • 1
  • Paolo Remagnino
    • 1
  • Adam Szarowicz
    • 1
  1. 1.Digital Imaging Research CentreSchool of Computing and Information Systems Kingston UniversityUK

Personalised recommendations