Heuristic Search Based Exploration in Reinforcement Learning

  • Ngo Anh Vien
  • Nguyen Hoang Viet
  • SeungGwan Lee
  • TaeChoong Chung
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4507)


In this paper, we consider reinforcement learning in systems with unknown environment where the agent must trade off efficiently between: exploration(long-term optimization) and exploitation (short-term optimization). ε− greedy algorithm is a method using near-greedy action selection rule. It behaves greedily (exploitation) most of the time, but every once in a while, say with small probability ε (exploration), instead select an action at random. Many works already proved that random exploration drives the agent towards poorly modeled states. Therefore, this study evaluates the role of heuristic based exploration in reinforcement learning. We proposed three methods: neighborhood search based exploration, simulated annealing based exploration, and tabu search based exploration. All techniques follow the same rule ”Explore the most unvisited state”. In the simulation, these techniques are evaluated and compared on a discrete reinforcement learning task (robot navigation).


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement Learning: A Survey. Journal of Artificial Intelligence Research 4, 237–285 (1996)Google Scholar
  2. 2.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)Google Scholar
  3. 3.
    Wiering, M.A.: Explorations in efficient reinforcement learning. Ph.D. dissertation, University of Amsterdam IDSIA (February 1999)Google Scholar
  4. 4.
    Thrun, S., Moller, K.: Active exploration in dynamic environments. In: Moody, J.E., Hanson, S.J., Lippmann, R. (eds.) Advances in Neural Information Processing Systems 4, pp. 531–538. Morgan Kaufmann, Washington (1992)Google Scholar
  5. 5.
    Nguyen, D., Widrow, B.: The truck backer upper: An example of self-learning in neural networks. In: Proceedings of the First International Joint Conference on Neural Networks Washington DC San Diego, Washington, DC, IEEE TAB Neural Network Committee (1989)Google Scholar
  6. 6.
    Thrun, S.B., Moller, K., Linden, A.: Planning with an adaptive world model. In: Advances in Neural Information Processing Systems, Morgan Kaufmann, San Mateo (1991)Google Scholar
  7. 7.
    Holland, J.H.: Adaptation in Natural and Artificial System, 2nd edn. MIT Press, Cambridge (1992)Google Scholar
  8. 8.
    Macready, W., Wolpert, D.H.: Bandit problems and the Exploration/Exploitation Tradeoff. IEEE Transactions on Evolutionary Computation 2(1), 2–22 (1998)CrossRefGoogle Scholar
  9. 9.
    Reeves, C.R.: Modern Heuristic Techniques for Combinatorial Problems. Blackwell Scientific Publication, Oxford (1993)zbMATHGoogle Scholar
  10. 10.
    Downsland, K.: Simulated annealing. In: Downsland, K. (ed.) Modern Heuristic Techniques for Combinatorial Problems, Blackwell Scientific Publication, Oxford (1993)Google Scholar
  11. 11.
    Davies, S., Ng, A., Moore, A.: Applying Online Search Techniques to Continuous-State Reinforcement Learning. In: Proceedings of the Fifteenth National Conference on Artificial Intelligence, AAAI (1998)Google Scholar
  12. 12.
    Atiya, A.F., Parlos, A.G., Ingber, L.: A reinforcement learning method based on adaptive simulated annealing. In: Proceedings of the 46th IEEE International Midwest Symposium on Circuits and Systems, MWSCAS ’03, vol.1, December 2003, pp. 121–124 (2003)Google Scholar
  13. 13.
    Abramsan, M., Wechsler, H.: Competitive reinforcement learning for combinatorial problems. In: International Joint Conference on Neural Network (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Ngo Anh Vien
    • 1
  • Nguyen Hoang Viet
    • 1
  • SeungGwan Lee
    • 1
  • TaeChoong Chung
    • 1
  1. 1.Artificial Intelligence Lab, Department of Computer Engineering, School of Electronics and Information, Kyunghee University, 1-Seocheon, Giheung, Yongin, Gyeonggi, 446-701South Korea

Personalised recommendations