Skip to main content
Log in

Clustering subspace generalization to obtain faster reinforcement learning

  • Original Paper
  • Published:
Evolving Systems Aims and scope Submit manuscript

Abstract

In reinforcement learning, very low and even lack of spatial generalization capability result in slow learning. Exploiting possible experience generalization in subspaces, a sub-dimension of the original state representation, is one approach to speed up learning in terms of required interactions. The target of this paper is to alleviate the detriment of the perceptual aliasing in the subspaces further to enhance the benefit of their generalization. We augment an extra dimension to the subspaces coming from a clustering process on the state-space. Through this framework, called Clustered-Model Based Learning with Subspaces (C-MoBLeS), states with similar policies are categorized to the same cluster and the agent can exploit the generalization of the subspace learning more by this localization. Therefore, the agent uses generalization of the subspaces which are in the cluster of the agent’s state. Several experiments show that C-MoBLeS increases the learning speed effectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. The codes for reproducing the results of this section can be downloaded from https://github.com/MHashemzadeh/C-MoBLeS.git.

References

  • Ackley DH, Littman ML (1990) Generalization and scaling in reinforcement learning. In: Advances in neural information processing systems 2. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp 550–557

    Google Scholar 

  • Angelov P (2014) Outside the box: an alternative data analytics framework. J Autom Mobile Robot Intell Syst 8(2):29–35

    Google Scholar 

  • Angelov P, Kasabov N (2005) Evolving computational intelligence systems. In: IEEE International Conference on Fuzzy Systems. IEEE, Brisbane, QLD, Australia, pp 76–82

    Google Scholar 

  • Arulkumaran K, Deisenroth MP, Brundage M, Bharath AA (2017) A brief survey of deep reinforcement learning. arXiv:1708.05866

  • Auer P, Jaksch T, Ortner R (2009) Near-optimal regret bounds for reinforcement learning. Adv Neural Inf Process Syst 21:89–96

    MATH  Google Scholar 

  • Baruah RD, Angelov P (2012) Evolving local means method for clustering of streaming data. In: IEEE International Conference on Fuzzy Systems. IEEE, Brisbane, QLD, Australia, pp 1–8

    Google Scholar 

  • Baruah RD, Angelov P (2013) Dec: Dynamically evolving clustering and its application to structure identification of evolving fuzzy models. IEEE Trans Cybern 44(9):1619–1631

    Article  Google Scholar 

  • Bruce J, Sünderhauf N, Mirowski P, Hadsell R, Milford M (2017) One-shot reinforcement learning for robot navigation with interactive replay. arXiv:1711.10137

  • Daee P, Mirian MS, Ahmadabadi MN (2014) Reward maximization justifies the transition from sensory selection at childhood to sensory integration at adulthood. PLoS One 9(12):e115926

    Article  Google Scholar 

  • Gu S, Holly E, Lillicrap T, Levine S (2017) Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In: IEEE International Conference on Robotics and Automation (ICRA). IEEE, pp 3389–3396

  • Gupta A, Devin C, Liu Y, Abbeel P, Levine S (2017) Learning invariant feature spaces to transfer skills with reinforcement learning. arXiv:1703.02949

  • Hashemzadeh M, Hosseini R, Ahmadabadi MN (2018) Exploiting generalization in the subspaces for faster model-based reinforcement learning. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2018.2869978

    Article  Google Scholar 

  • Hausknecht M, Stone P (2015) Deep recurrent Q-learning for partially observable MDPs. arXiv:1507.06527

  • Hemminghaus J, Kopp S (2017) Towards adaptive social behavior generation for assistive robots using reinforcement learning. In: Proceedings of the 2017 ACM/IEEE International Conference on Human-Robot Interaction. ACM, New York, pp 332–340

  • Hoeffding W (1963) Probability inequalities for sums of bounded random variables. J Am Stat Assoc 58(301):13–30

    Article  MathSciNet  Google Scholar 

  • Hyde R, Angelov P, MacKenzie AR (2017) Fully online clustering of evolving data streams into arbitrarily shaped clusters. Inf Sci 382:96–114

    Article  Google Scholar 

  • Kohl N, Stone P (2004) Policy gradient reinforcement learning for fast quadrupedal locomotion. IEEE Int Conf Robot Autom (ICRA) 3:2619–2624

    Google Scholar 

  • Kulkarni TD, Narasimhan K, Saeedi A, Tenenbaum J (2016) Hierarchical deep reinforcement learning: integrating temporal abstraction and intrinsic motivation. In: Advances in neural information processing systems, vol 29. Curran Associates, Inc., New York, pp 3675–3683

    Google Scholar 

  • Lai M (2015) Giraffe: using deep reinforcement learning to play chess. arXiv:1509.01549

  • Li Z, Zhao T, Chen F, Yingbai H, Chun-Yi S, Fukuda T (2018) Reinforcement learning of manipulation and grasping using dynamical movement primitives for a humanoidlike mobile manipulator. IEEE/ASME Trans Mech 23(1):121–131

    Article  Google Scholar 

  • Lonnie C (1992) Reinforcement learning with perceptual aliasing: The perceptual distinctions approach. In: Proceedings of the Tenth National Conference on Artificial Intelligence. AAAI Press, San Jose, California, pp 183–188

    Google Scholar 

  • Michener CD, Sokal RR (1957) A quantitative approach to a problem in classification. Evolution 11(2):130–162

    Article  Google Scholar 

  • Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529

    Article  Google Scholar 

  • Mnih V, Badia AP, Mirza M, Graves A, Lillicrap T, Harley T, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning, pp 1928–1937

  • Modares H , Ranatunga I, AlQaudi B, Lewis FL, Popa DO (2017) Intelligent human–robot interaction systems using reinforcement learning and neural networks. In: Trends in control and decision-making for human–robot collaboration systems. Springer, Switzerland, pp 153–176

    Chapter  Google Scholar 

  • Ng AY, Coates A, Diel M, Ganapathi V, Schulte J, Tse B, Berger E, Liang E (2006) Autonomous inverted helicopter flight via reinforcement learning. In: Experimental robotics IX. Springer, Berlin, pp 363–372

    Chapter  Google Scholar 

  • Oh J, Guo X, Lee H, Lewis RL, Singh S (2015) Action-conditional video prediction using deep networks in atari games. In: Advances in neural information processing systems, vol 28. Curran Associates, Inc., New York, pp 2863–2871

    Google Scholar 

  • Popov I, Heess N, Lillicrap T, Hafner R, Barth-Maron G, Vecerik M, Lampe T, Tassa Y, Erez T, Riedmiller M (2017) Data-efficient deep reinforcement learning for dexterous manipulation. arXiv:1704.03073

  • Puterman ML (1994) Markov decision processes: discrete stochastic dynamic programming. Wiley, New York

    Book  Google Scholar 

  • Rajeswaran A, Kumar V, Gupta A, Schulman J, Todorov E, Levine S (2017) Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. arXiv:1709.10087

  • Richard SS, Andrew GB (2018) Reinforcement learning: an introduction. MIT press, Cambridge

    MATH  Google Scholar 

  • Schulman J, Levine S, Abbeel P, Jordan M, Moritz P (2015) Trust region policy optimization. In: International Conference on Machine Learning. PMLR, France, pp 1889–1897

    Google Scholar 

  • Shoeleh F, Asadpour M (2017) Graph based skill acquisition and transfer learning for continuous reinforcement learning domains. Pattern Recogn Lett 87:104–116

    Article  Google Scholar 

  • Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M et al (2016) Mastering the game of Go with deep neural networks and tree search. Nature 529(7587):484–489

    Article  Google Scholar 

  • Singh S, Litman D, Kearns M, Walker M (2002) Optimizing dialogue management with reinforcement learning: experiments with the NJFun system. J Artif Intell Res 16:105–133

    Article  Google Scholar 

  • Stadie BC, Levine S, Abbeel P (2015) Incentivizing exploration in reinforcement learning with deep predictive models. arXiv:1507.00814

  • Stone P, Veloso M (2000) Multiagent systems: a survey from a machine learning perspective. Auton Robots 8(3):345–383

    Article  Google Scholar 

  • Sutton RS (1995) Generalization in reinforcement learning: successful examples using sparse coarse coding. In: Advances in neural information processing systems 8. MIT Press, Cambridge, MA, pp 1038–1044

    Google Scholar 

  • Synnaeve G, Nardelli N, Auvolat A, Chintala S, Lacroix T, Lin Z, Richoux F, Usunier N (2016) Torchcraft: a library for machine learning research on real-time strategy games. arXiv:1611.00625

  • Tesauro G (1995) Temporal difference learning and TD-Gammon. Commun ACM 38(3):58–68

    Article  Google Scholar 

  • Van Seijen H, Sutton RS (2013) Efficient planning in mdps by small backups. In: Proceedings of the international conference on machine learning. JMLR, Atlanta, GA, USA, pp 361–369

    Google Scholar 

  • Vinyals O, Ewalds T, Bartunov S, Georgiev P, Vezhnevets AS, Yeo M, Makhzani A, Küttler H, Agapiou J, Schrittwieser J et al (2017) Starcraft II: a new challenge for reinforcement learning. arXiv:1708.04782

  • Wang W, Hao J, Wang Y, Taylor M (2018) Towards cooperation in sequential prisoner’s dilemmas: a deep multiagent reinforcement learning approach. arXiv:1803.00162

  • Weissman Tsachy, Ordentlich Erik, Seroussi Gadiel, Verdu Sergio, Weinberger Marcelo J (2003) Inequalities for the \(\text{L}_1\) deviation of the empirical distribution. Technical report, HP Laboratories Palo Alto

  • Zhu Y, Mottaghi R, Kolve E, Lim JJ, Gupta A, Fei-Fei L, Farhadi A (2017) Target-driven visual navigation in indoor scenes using deep reinforcement learning. In: IEEE International Conference on Robotics and Automation (ICRA). IEEE, pp 3357–3364

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maryam Hashemzadeh.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hashemzadeh, M., Hosseini, R. & Ahmadabadi, M.N. Clustering subspace generalization to obtain faster reinforcement learning. Evolving Systems 11, 89–103 (2020). https://doi.org/10.1007/s12530-019-09290-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12530-019-09290-9

Keywords

Navigation