Adaptive Fuzzy Watkins: A New Adaptive Approach for Eligibility Traces in Reinforcement Learning

Shokri, Matin; Khasteh, Seyed Hossein; Aminifar, Amin

doi:10.1007/s40815-019-00633-x

Adaptive Fuzzy Watkins: A New Adaptive Approach for Eligibility Traces in Reinforcement Learning

Published: 29 April 2019

Volume 21, pages 1443–1454, (2019)
Cite this article

International Journal of Fuzzy Systems Aims and scope Submit manuscript

223 Accesses
Explore all metrics

Abstract

Reinforcement learning is one of the most reliable methods, which have been used to solve many problems. One of the best reinforcement learning family methods are temporal difference methods. The most important weakness of reinforcement learning methods, such as temporal difference methods, is that these methods have slow convergence rate. Many studies are devoted to solving this problem. One of the proposed solutions to this problem is eligibility traces. Owing to the nature of off-policy methods, combining eligibility traces with off-policy methods requires special attention. In the early learning process for Watkins method (one of the dominant eligibility traces methods), cutting eligibility traces during exploratory actions results in diminishing benefits of eligibility traces method. In this study, we propose a framework to combine eligibility traces with off-policy methods. This research attempts to properly use the information explored during action exploration of the agent; to this end, the decision about applying the eligibility traces during the exploratory actions of the agent is made by means of fuzzy adaptation. We apply this method to find the goal state in the static and dynamic grid world. We compare our approach against the state of the art techniques and show that it outperforms these techniques both in terms of averaged achieved reward and also the convergence time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hybrid fuzzy AHP–TOPSIS approach to prioritizing solutions for inverse reinforcement learning

Article Open access 20 July 2022

How an Adaptive Learning Rate Benefits Neuro-Fuzzy Reinforcement Learning Systems

A Configurable off-Policy Evaluation with Key State-Based Bias Constraints in AI Reinforcement Learning

References

Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (2018)
MATH Google Scholar
Van Seijen, H., Mahmood, A.R., Pilarski, P.M., Machado, M.C., Sutton, R.S.: True online temporal-difference learning. J. Mach. Learn. Res. 17(1), 5057–5096 (2016)
MathSciNet MATH Google Scholar
Boyan, J.A.: Technical update: least-squares temporal difference learning. Mach. Learn. 49(2–3), 233–246 (2002)
Article MATH Google Scholar
Choi, D., Van Roy, B.: A generalized kalman filter for fixed point approximation and efficient temporal-difference learning. Discrete Event Dyn. Syst. 16(2), 207–239 (2006)
Article MathSciNet MATH Google Scholar
Yu, H., Bertsekas, D.P.: Convergence results for some temporal difference methods based on least squares. IEEE Trans. Autom. Control 54(7), 1515–1531 (2009)
Article MathSciNet MATH Google Scholar
Maei, H.R., Szepesvári, C., Bhatnagar, S., Sutton, R.S.: Toward off-policy learning control with function approximation. In: ICML, pp. 719–726 (2010)
Sutton, R.S., Maei, H.R., Precup, D., Bhatnagar, S., Silver, D., Szepesvári, C., Wiewiora, E.: Fast gradient-descent methods for temporal-difference learning with linear function approximation, In: Proceedings of the 26th Annual International Conference on Machine Learning, 993–1000. ACM (2009)
Maei, H.R., Sutton, R.S.: Gq (\(\lambda\)): a general gradient algorithm for temporal-difference prediction learning with eligibility traces. In: Proceedings of the Third Conference on Artificial General Intelligence, vol. 1, pp. 91–96 (2010)
Geist, M., Scherrer, B.: Off-policy learning with eligibility traces: a survey. J. Mach. Learn. Res. 15(1), 289–333 (2014)
MathSciNet MATH Google Scholar
Gehring, C., Pan, Y., White, M.: Incremental truncated lstd, arXiv preprint arXiv:1511.08495 (2015)
Pan, Y., White, A.M., White, M.: Accelerated gradient temporal difference learning. In: AAAI, 2464–2470 (2017)
Devraj, A.M., Meyn, S.P.: Fastest convergence for q-learning, arXiv preprint arXiv:1707.03770 (2017)
Chen, S.-L., Wei, Y.-M.: Least-squares sarsa (lambda) algorithms for reinforcement learning, In: Natural Computation, 2008. ICNC’08. Fourth International Conference on, vol. 2, pp. 632–636, IEEE (2008)
Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey. J. Artif. Intell. Res. 4, 237–285 (1996)
Article Google Scholar
Engel, Y.: Algorithms and Representations for Reinforcement Learning. Hebrew University of Jerusalem, Jerusalem (2005)
Google Scholar
Dolk, V.: Survey Reinforcement Learning. Eindhoven University of Technology, Eindhoven (2010)
Google Scholar
Glorennec, P.Y., Jouffe, L.: Fuzzy q-learning. In: Proceedings of 6th International Fuzzy Systems Conference, vol. 2, 659–662 (1997)
Er, M.J., Deng, C.: Online tuning of fuzzy inference systems using dynamic fuzzy q-learning. IEEE Trans. Syst. Man Cybern. Part B (Cybernetics) 34(3), 1478–1489 (2004)
Article Google Scholar
Buşoniu, L., Ernst, D., De Schutter, B., Babuška, R.: Continuous-state reinforcement learning with fuzzy approximation, In: Adaptive Agents and Multi-Agent Systems III. Adaptation and Multi-Agent Learning, pp. 27–43, Springer, London (2008)
Bonarini, A., Lazaric, A., Montrone, F., Restelli, M.: Reinforcement distribution in fuzzy q-learning. Fuzzy Sets Syst. 160(10), 1420–1443 (2009)
Article MathSciNet MATH Google Scholar
Zajdel, R.: Fuzzy q(\(\lambda\))-learning algorithm. In: Rutkowski, L., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds) Artificial Intelligence and Soft Computing. (Berlin, Heidelberg), pp. 256–263, Springer, Berlin (2010)
Watkins, C.J.C.H.: Learning from delayed rewards. Ph.D thesis, King’s College, Cambridge (1989)
Peng, J., Williams, R.J.: Incremental multi-step q-learning, In: Machine Learning Proceedings 1994, 226–232. Elsevier, Amsterdam (1994)
Sutton, R., Barto, A.: Reinforcement Learning. MIT Press, Cambridge (1998)
MATH Google Scholar
Leng, J., Fyfe, C., Jain, L.C.: Experimental analysis on sarsa (\(\lambda\)) and q (\(\lambda\)) with different eligibility traces strategies. J. Intell. Fuzzy Syst. 20(1,2), 73–82 (2009)
MATH Google Scholar
Even-Dar, E., Mansour, Y.: Learning rates for q-learning. J. Mach. Learn. Res. 5, 1–25 (2003). no. Dec
MathSciNet MATH Google Scholar
Tizhoosh, H.: Opposition-based reinforcement learning. JACIII 10(01), 578–585 (2006)
Article Google Scholar
Azar, M.G., Munos, R., Ghavamzadeh, M., Kappen, H.: Speedy q-learning, In: Advances in Neural Information Processing Systems (2011)
Devraj, A.M., Meyn, S.: Zap q-learning, In: Advances in Neural Information Processing Systems, 2235–2244 (2017)
Wang, L.: A Couse in Fuzzy Systems and Control. Prentice-Hall, London (1997)
Google Scholar
Dai, X., Li, C.-K., Rad, A.B.: An approach to tune fuzzy controllers based on reinforcement learning for autonomous vehicle control. IEEE Trans. Intell. Transp. Syst. 6(3), 285–293 (2005)
Article Google Scholar
Schneider, T.D.: Information theory primer with an appendix on logarithms. In: National Cancer Institute, Citeseer (2007)
Borda, M.: Fundamentals in Information Theory and Coding. Springer, Berlin (2011)
Book MATH Google Scholar

Download references

Author information

Authors and Affiliations

K. N. Toosi University of Technology, Tehran, Iran
Matin Shokri, Seyed Hossein Khasteh & Amin Aminifar

Authors

Matin Shokri
View author publications
You can also search for this author in PubMed Google Scholar
Seyed Hossein Khasteh
View author publications
You can also search for this author in PubMed Google Scholar
Amin Aminifar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Seyed Hossein Khasteh.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shokri, M., Khasteh, S.H. & Aminifar, A. Adaptive Fuzzy Watkins: A New Adaptive Approach for Eligibility Traces in Reinforcement Learning. Int. J. Fuzzy Syst. 21, 1443–1454 (2019). https://doi.org/10.1007/s40815-019-00633-x

Download citation

Received: 10 July 2018
Revised: 18 December 2018
Accepted: 18 March 2019
Published: 29 April 2019
Issue Date: 12 July 2019
DOI: https://doi.org/10.1007/s40815-019-00633-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Adaptive Fuzzy Watkins: A New Adaptive Approach for Eligibility Traces in Reinforcement Learning

Abstract

Access this article

Similar content being viewed by others

Hybrid fuzzy AHP–TOPSIS approach to prioritizing solutions for inverse reinforcement learning

How an Adaptive Learning Rate Benefits Neuro-Fuzzy Reinforcement Learning Systems

A Configurable off-Policy Evaluation with Key State-Based Bias Constraints in AI Reinforcement Learning

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Adaptive Fuzzy Watkins: A New Adaptive Approach for Eligibility Traces in Reinforcement Learning

Abstract

Access this article

Similar content being viewed by others

Hybrid fuzzy AHP–TOPSIS approach to prioritizing solutions for inverse reinforcement learning

How an Adaptive Learning Rate Benefits Neuro-Fuzzy Reinforcement Learning Systems

A Configurable off-Policy Evaluation with Key State-Based Bias Constraints in AI Reinforcement Learning

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation