Effectiveness of Considering State Similarity for Reinforcement Learning

  • Sertan Girgin
  • Faruk Polat
  • Reda Alhajj
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4224)


This paper presents a novel approach that locates states with similar sub-policies, and incorporates them into the reinforcement learning framework for better learning performance. This is achieved by identifying common action sequences of states, which are derived from possible optimal policies and reflected into a tree structure. Based on the number of such sequences, we define a similarity function between two states, which helps to reflect updates on the action-value function of a state to all similar states. This way, experience acquired during learning can be applied to a broader context. The effectiveness of the method is demonstrated empirically.


Optimal Policy Reinforcement Learning Action Sequence Path Tree Reinforcement Learning Algorithm 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Kaelbling, L., Littman, M., Moore, A.: Reinforcement learning: A survey. Journal of Artificial Intelligence Research 4, 237–285 (1996)Google Scholar
  2. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)Google Scholar
  3. Barto, A.G., Mahadevan, S.: Recent advances in hierarchical reinforcement learning. Discrete Event Dynamic Systems 13, 341–379 (2003)CrossRefMathSciNetGoogle Scholar
  4. Sutton, R.S., Precup, D., Singh, S.: Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artificial Intelligence 112, 181–211 (1999)MATHCrossRefMathSciNetGoogle Scholar
  5. Stolle, M., Precup, D.: Learning options in reinforcement learning. In: Proc. of the 5th Int. Symp. on Abstraction, Reformulation and Approximation, pp. 212–223 (2002)Google Scholar
  6. McGovern, A., Barto, A.G.: Automatic discovery of subgoals in reinforcement learning using diverse density. In: Proc. of the 18th ICML, pp. 361–368 (2001)Google Scholar
  7. Menache, I., Mannor, S., Shimkin, N.: Q-cut - dynamic discovery of sub-goals in reinforcement learning. In: Proc. of the 13th ECML, pp. 295–306 (2002)Google Scholar
  8. Simsek, O., Wolfe, A.P., Barto, A.G.: Identifying useful subgoals in reinforcement learning by local graph partitioning. In: Proc. of the 22nd ICML (2005)Google Scholar
  9. Lin, L.J.: Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning 8, 293–321 (1992)Google Scholar
  10. Dietterich, T.G.: Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research 13, 227–303 (2000)MATHMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Sertan Girgin
    • 1
    • 2
  • Faruk Polat
    • 1
  • Reda Alhajj
    • 2
    • 3
  1. 1.Department of Computer Eng.Middle East Technical UniversityAnkaraTurkey
  2. 2.Department of Computer ScienceUniversity of CalgaryCalgaryCanada
  3. 3.Dept. of Computer ScienceGlobal UniversityBeirutLebanon

Personalised recommendations