Reinforcement learning algorithm for non-stationary environments

Padakandla, Sindhu; K. J., Prabuchandran; Bhatnagar, Shalabh

doi:10.1007/s10489-020-01758-5

Reinforcement learning algorithm for non-stationary environments

Published: 18 June 2020

Volume 50, pages 3590–3606, (2020)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Sindhu Padakandla ORCID: orcid.org/0000-0003-3385-294X¹,
Prabuchandran K. J.¹ &
Shalabh Bhatnagar¹

3638 Accesses
50 Citations
7 Altmetric
Explore all metrics

Abstract

Reinforcement learning (RL) methods learn optimal decisions in the presence of a stationary environment. However, the stationary assumption on the environment is very restrictive. In many real world problems like traffic signal control, robotic applications, etc., one often encounters situations with non-stationary environments, and in these scenarios, RL methods yield sub-optimal decisions. In this paper, we thus consider the problem of developing RL methods that obtain optimal decisions in a non-stationary environment. The goal of this problem is to maximize the long-term discounted reward accrued when the underlying model of the environment changes over time. To achieve this, we first adapt a change point algorithm to detect change in the statistics of the environment and then develop an RL algorithm that maximizes the long-run reward accrued. We illustrate that our change point method detects change in the model of the environment effectively and thus facilitates the RL algorithm in maximizing the long-run reward. We further validate the effectiveness of the proposed solution on non-stationary random Markov decision processes, a sensor energy management problem, and a traffic signal control problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Towards Reinforcement Learning for Non-stationary Environments

Non-stationarity Detection in Model-Free Reinforcement Learning via Value Function Monitoring

Development of Conceptual Framework for Reinforcement Learning Based Optimal Control

Notes

References

Abdallah S, Kaisers M (2016) Addressing environment non-stationarity by repeating q-learning updates. J Mach Learn Res 17(46):1–31
MathSciNet MATH Google Scholar
Abounadi J, Bertsekas D, Borkar V (2001) Learning algorithms for markov decision processes with average cost. SIAM J Control Optim 40(3):681–698. https://doi.org/10.1137/S0363012999361974
Article MathSciNet MATH Google Scholar
Andrychowicz M et al (2019). Learning dexterous in-hand manipulation. Int J Robot Res https://doi.org/10.1177/0278364919887447
Banerjee T, Miao Liu, How JP (2017) Quickest change detection approach to optimal control in markov decision processes with model changes. In: 2017 American Control Conference (ACC). https://doi.org/10.23919/ACC.2017.7962986, pp 399–405
Bertsekas D (2013) Dynamic programming and optimal control vol 2, 4th edn. Athena Scientific, Belmont
MATH Google Scholar
Cano A, Krawczyk B (2019) Evolving rule-based classifiers with genetic programming on gpus for drifting data streams. Pattern Recogn 87:248–268. https://doi.org/10.1016/j.patcog.2018.10.024
Article Google Scholar
Choi SP, Yeung DY, Zhang NL (2000a) Hidden-mode markov decision processes for nonstationary sequential decision making. In: Sequence Learning. Springer, pp 264–287
Choi S P M, Yeung D Y, Zhang N L (2000b) An environment model for nonstationary reinforcement learning. In: Solla S A, Leen T K, Müller K (eds) Advances in neural information processing systems, vol 12. MIT Press, pp 987–993
Csáji B C, Monostori L (2008) Value function based reinforcement learning in changing markovian environments. J Mach Learn Res 9:1679–1709
MathSciNet MATH Google Scholar
Dick T, György A, Szepesvári C (2014) Online learning in markov decision processes with changing cost sequences. In: Proceedings of the 31st international conference on International Conference on Machine Learning - vol 32, JMLR.org, ICML’14, pp I–512–I–520
Ding S, Du W, Zhao X, Wang L, Jia W (2019) A new asynchronous reinforcement learning algorithm based on improved parallel PSO. Appl Intell 49(12):4211–4222. https://doi.org/10.1007/s10489-019-01487-4
Article Google Scholar
Everett R (2018) Learning against non-stationary agents with opponent modelling and deep reinforcement learning. In: 2018 AAAI spring symposium series
Hadoux E, Beynier A, Weng P (2014) Sequential decision-making under non-stationary environments via sequential change-point detection. In: Learning over Multiple Contexts (LMCE), Nancy, France
Hallak A, Castro D D, Mannor S (2015) Contextual markov decision processes. In: Proceedings of the 12th European Workshop on Reinforcement Learning (EWRL)
Harel M, Mannor S, El-Yaniv R, Crammer K (2014) Concept drift detection through resampling, pp 1009–1017
Iwashita A S, Papa J P (2019) An overview on concept drift learning. IEEE Access 7:1532–1547. https://doi.org/10.1109/ACCESS.2018.2886026
Article Google Scholar
Jaksch T, Ortner R, Auer P (2010) Near-optimal regret bounds for reinforcement learning. J Mach Learn Res 11:1563–1600
MathSciNet MATH Google Scholar
Kaplanis C et al (2019) Policy consolidation for continual reinforcement learning. In: Proceedings of the 36th international conference on machine learning, PMLR, vol 97, pp 3242–3251
Kemker R et al (2018) Measuring catastrophic forgetting in neural networks. In: Thirty-second AAAI conference on artificial intelligence
Kolomvatsos K, Anagnostopoulos C (2017) Reinforcement learning for predictive analytics in smart cities. In: Informatics, multidisciplinary digital publishing institute, vol 4, p 16
Konda V R, Tsitsiklis J N (2003) On actor-critic algorithms. SIAM J Control Optim 42(4):1143–1166
MathSciNet MATH Google Scholar
Krawczyk B, Cano A (2018) Online ensemble learning with abstaining classifiers for drifting and noisy data streams. Appl Soft Comput 68:677–69. https://doi.org/10.1016/j.asoc.2017.12.008
Google Scholar
Levin, David A, Peres Y, Wilmer EL, Elizabeth L (2006) Markov Chains and Mixing Times. American Mathematical Soc.
Liebman E, Zavesky E, Stone P (2018) A stitch in time - autonomous model management via reinforcement learning. In: Proceedings of the 17th international conference on autonomous agents and multiagent systems, international foundation for Autonomous Agents and Multiagent Systems, AAMAS ’18, pp 990–998
Matteson D S, James N A (2014) A nonparametric approach for multiple change point analysis of multivariate data. J Am Stat Assoc 109(505):334–345
MathSciNet MATH Google Scholar
Minka T (2000) Estimating a Dirichlet distribution
Mohammadi M, Al-Fuqaha A (2018) Enabling cognitive smart cities using big data and machine learning: approaches and challenges. IEEE Commun Mag 56(2):94–101. https://doi.org/10.1109/MCOM.2018.1700298
Article Google Scholar
Nagabandi A et al (2018) Learning to adapt: meta-learning for model-based control. arXiv:1803.11347
Niroui F, Zhang K, Kashino Z, Nejat G (2019) Deep reinforcement learning robot for search and rescue applications: exploration in unknown cluttered environments. IEEE Robot Autom Lett 4(2):610–617. https://doi.org/10.1109/LRA.2019.2891991
Article Google Scholar
Ortner R, Gajane P, Auer P (2019) Variational regret bounds for reinforcement learning. In: Proceedings of the 35th conference on uncertainty in artificial intelligence
Page E S (1954) Continuous inspection schemes. Biometrika 41(1/2):100–115
MathSciNet MATH Google Scholar
Prabuchandran K J, Meena S K, Bhatnagar S (2013) Q-learning based energy management policies for a single sensor node with finite buffer. IEEE Wirel Commun Lett 2(1):82–85. https://doi.org/10.1109/WCL.2012.112012.120754
Article Google Scholar
Prabuchandran KJ, Singh N, Dayama P, Pandit V (2019). Change Point Detection for Compositional Multivariate Data. arXiv:1901.04935
Prashanth LA, Bhatnagar S (2011) Reinforcement learning with average cost for adaptive control of traffic lights at intersections. https://doi.org/10.1109/ITSC.2011.6082823, pp 1640–1645
Puterman M L (2005) Markov decision processes: discrete stochastic dynamic programming, 2nd edn. Wiley, New York
MATH Google Scholar
Roveri M (2019) Learning discrete-time markov chains under concept drift. IEEE Trans Neural Netw Learn Syst 30(9):2570–2582. https://doi.org/10.1109/TNNLS.2018.2886956
Article MathSciNet Google Scholar
Salkham A, Cahill V (2010) Soilse: a decentralized approach to optimization of fluctuating urban traffic using reinforcement learning. In: 13th international IEEE conference on intelligent transportation systems. https://doi.org/10.1109/ITSC.2010.5625145, pp 531–538
Shiryaev A (1963) On Optimum Methods in Quickest Detection Problems. Theory Probab Appl 8(1):22–46
MathSciNet MATH Google Scholar
da Silva BC, Basso EW, Bazzan ALC, Engel PM (2006) Dealing with non-stationary environments using context detection. In: Proceedings of the 23rd International Conference on Machine Learning, Association for Computing Machinery, ICML ’06. https://doi.org/10.1145/1143844.1143872, pp 217–224
Sutton R S, Barto A G (2018) Reinforcement learning: an introduction, 2nd. MIT Press, Cambridge
MATH Google Scholar
Sutton RS, McAllester D, Singh S, Mansour Y (1999) Policy gradient methods for reinforcement learning with function approximation. In: Proceedings of the 12th international conference on neural information processing systems, pp 1057–1063
Tatbul N, Lee TJ, Zdonik S, Alam M, Gottschlich J (2018) Precision and recall for time series. In: Advances in neural information processing systems, pp 1920–1930
Tijsma AD, Drugan MM, Wiering MA (2016) Comparing exploration strategies for q-learning in random stochastic mazes. In: 2016 IEEE Symposium Series on Computational Intelligence (SSCI). https://doi.org/10.1109/SSCI.2016.7849366, pp 1–8
Watkins C J, Dayan P (1992) Q-learning. Mach Learn 8(3-4):279–292
MATH Google Scholar
Yu JY, Mannor S (2009) Online learning in markov decision processes with arbitrarily changing rewards and transitions. In: 2009 international conference on game theory for networks, pp 314–322, DOI https://doi.org/10.1109/GAMENETS.2009.5137416, (to appear in print)
Zhao X et al (2019) Applications of asynchronous deep reinforcement learning based on dynamic updating weights. Appl Intell 49(2):581–591. https://doi.org/10.1007/s10489-018-1296-x
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Automation, Indian Institute of Science, Bangalore, India
Sindhu Padakandla, Prabuchandran K. J. & Shalabh Bhatnagar

Authors

Sindhu Padakandla
View author publications
You can also search for this author in PubMed Google Scholar
Prabuchandran K. J.
View author publications
You can also search for this author in PubMed Google Scholar
Shalabh Bhatnagar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sindhu Padakandla.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Padakandla, S., K. J., P. & Bhatnagar, S. Reinforcement learning algorithm for non-stationary environments. Appl Intell 50, 3590–3606 (2020). https://doi.org/10.1007/s10489-020-01758-5

Download citation

Published: 18 June 2020
Issue Date: November 2020
DOI: https://doi.org/10.1007/s10489-020-01758-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Reinforcement learning algorithm for non-stationary environments

Abstract

Access this article

Similar content being viewed by others

Towards Reinforcement Learning for Non-stationary Environments

Non-stationarity Detection in Model-Free Reinforcement Learning via Value Function Monitoring

Development of Conceptual Framework for Reinforcement Learning Based Optimal Control

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Reinforcement learning algorithm for non-stationary environments

Abstract

Access this article

Similar content being viewed by others

Towards Reinforcement Learning for Non-stationary Environments

Non-stationarity Detection in Model-Free Reinforcement Learning via Value Function Monitoring

Development of Conceptual Framework for Reinforcement Learning Based Optimal Control

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation