An Analysis of the Pheromone Q-Learning Algorithm
The Phe-Q machine learning technique, a modified Q-learning technique, was developed to enable co-operating agents to communicate in learning to solve a problem. The Phe-Q learning technique combines Q-learning with synthetic pheromone to improve on the speed of convergence. The Phe-Q update equation includes a belief factor that reflects the confidence the agent has in the pheromone (the communication) deposited in the environment by other agents. With the Phe-Q update equation, speed of convergence towards an optimal solution depends on a number parameters including the number of agents solving a problem, the amount of pheromone deposited, and the evaporation rate. In this paper, work carried out to optimise speed of learning with the Phe-Q technique is described. The objective was to to optimise Phe-Q learning with respect to pheromone deposition rates, evaporation rates.
KeywordsEvaporation Rate Travel Salesman Problem Pheromone Trail Synthetic Pheromone Belief Factor
Unable to display preview. Download preview PDF.
- 3.D.P. Bertsekas and J.N. Tsitsiklis. Neuro-Dynamic Programming. Athena Scientific, 1996.Google Scholar
- 4.E. Bonabeau, M. Dorigo, and G. Theraulaz. Swarm intelligence, From Natural to Artificial Systems. Oxford University Press, 1999.Google Scholar
- 6.G. Di Caro and M. Dorigo. Antnet: a mobile agents approach to adaptive routing. Technical Report: IRIDIA/97-12, Universite Libre de Bruxelles, Belgium. http://citeseer.nj.nec.com/dicaro97antnet.html.
- 7.A. Colorni, M. Dorigo, and V. Maniezzo. Ant system for job-shop scheduling. Belgian Journal of OR, statistics and computer science, 34:39–53, 1993.Google Scholar
- 8.A. Colorni, M. Dorigo, and G. Theraulaz. Distributed optimzation by ant colonies. In Proceedings First European Conf. on Artificial Life, pages 134–142, 1991.Google Scholar
- 9.J.L. Deneubourg and S. Goss. Collective patterns and decision making. Ethol. Ecol. and Evol., 1:295–311, 1993.Google Scholar
- 11.M. Dorigo, V. Maniezzo, and A. Colorni. The ant system: Optimization by a colony of cooperatin agents. IEEE Trans. on Systems, Man, and Cybernetics, 26:1–13, 1996.Google Scholar
- 12.L. M. Gambardella and M. Dorigo. Ant-q:A reinforcement learning approach to the traveling salesman problem. In Proc. 12Th ICML, pages 252–260, 1995.Google Scholar
- 13.L. M. Gambardella, E. D. Taillard, and M. Dorigo. Ant colonies for the qap. Journal of Operational Research society, 1998.Google Scholar
- 14.S. Goss, S. Aron, J.L. Deneubourg, and J. M. Pasteels. Self-organized shorcuts in the argentine ants. Naturwissenschaften, pages 579–581, 1989.Google Scholar
- 15.L. R. Leerink, S. R. Schultz, and M. A. Jabri. A reinforcement learning exploration strategy based on ant foraging mechanisms. In Proc. 6Th Australian Conference on Neural Nets, 1995.Google Scholar
- 16.N. Monekosso and P. Remagnino. Phe-q:Apheromone based q-learning. In AI2001:Advances in Artificial Intelligence, 14Th Australian Joint Conf. on A.I., pages 345–355, 2001.Google Scholar
- 17.H. Van Dyke Parunak and S. Brueckner. Ant-like missionnaries and cannibals: Synthetic pheromones for distributed motion control. In Proc. of ICMAS’00, 2000.Google Scholar
- 18.H. Van Dyke Parunak, S. Brueckner, J. Sauter, and J. Posdamer. Mechanisms and military applications for synthetic pheromones. In Proc. 5Th International Conference Autonomous Agents, Montreal, Canada, 2001.Google Scholar
- 19.R. S. Sutton and A.G. Barto. Reinforcement Learning. MIT Press, 1998.Google Scholar
- 21.R. T. Vaughan, K. Stoy, G. S. Sukhatme, and M. J. Mataric. Whistling in the dark: Cooperative trail following in uncertain localization space. In Proc. 4Th International Conference on Autonomous Agents, Barcelona, Spain, 2000.Google Scholar
- 22.C. J. C. H. Watkins. Learning with delayed rewards. PhD thesis, University of Cambridge, 1989.Google Scholar