Heuristic Search Based Exploration in Reinforcement Learning
In this paper, we consider reinforcement learning in systems with unknown environment where the agent must trade off efficiently between: exploration(long-term optimization) and exploitation (short-term optimization). ε− greedy algorithm is a method using near-greedy action selection rule. It behaves greedily (exploitation) most of the time, but every once in a while, say with small probability ε (exploration), instead select an action at random. Many works already proved that random exploration drives the agent towards poorly modeled states. Therefore, this study evaluates the role of heuristic based exploration in reinforcement learning. We proposed three methods: neighborhood search based exploration, simulated annealing based exploration, and tabu search based exploration. All techniques follow the same rule ”Explore the most unvisited state”. In the simulation, these techniques are evaluated and compared on a discrete reinforcement learning task (robot navigation).
Unable to display preview. Download preview PDF.
- 1.Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement Learning: A Survey. Journal of Artificial Intelligence Research 4, 237–285 (1996)Google Scholar
- 2.Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)Google Scholar
- 3.Wiering, M.A.: Explorations in efficient reinforcement learning. Ph.D. dissertation, University of Amsterdam IDSIA (February 1999)Google Scholar
- 4.Thrun, S., Moller, K.: Active exploration in dynamic environments. In: Moody, J.E., Hanson, S.J., Lippmann, R. (eds.) Advances in Neural Information Processing Systems 4, pp. 531–538. Morgan Kaufmann, Washington (1992)Google Scholar
- 5.Nguyen, D., Widrow, B.: The truck backer upper: An example of self-learning in neural networks. In: Proceedings of the First International Joint Conference on Neural Networks Washington DC San Diego, Washington, DC, IEEE TAB Neural Network Committee (1989)Google Scholar
- 6.Thrun, S.B., Moller, K., Linden, A.: Planning with an adaptive world model. In: Advances in Neural Information Processing Systems, Morgan Kaufmann, San Mateo (1991)Google Scholar
- 7.Holland, J.H.: Adaptation in Natural and Artificial System, 2nd edn. MIT Press, Cambridge (1992)Google Scholar
- 10.Downsland, K.: Simulated annealing. In: Downsland, K. (ed.) Modern Heuristic Techniques for Combinatorial Problems, Blackwell Scientific Publication, Oxford (1993)Google Scholar
- 11.Davies, S., Ng, A., Moore, A.: Applying Online Search Techniques to Continuous-State Reinforcement Learning. In: Proceedings of the Fifteenth National Conference on Artificial Intelligence, AAAI (1998)Google Scholar
- 12.Atiya, A.F., Parlos, A.G., Ingber, L.: A reinforcement learning method based on adaptive simulated annealing. In: Proceedings of the 46th IEEE International Midwest Symposium on Circuits and Systems, MWSCAS ’03, vol.1, December 2003, pp. 121–124 (2003)Google Scholar
- 13.Abramsan, M., Wechsler, H.: Competitive reinforcement learning for combinatorial problems. In: International Joint Conference on Neural Network (2001)Google Scholar