Recommended Reading
Abounadi J, Bertsekas DP, Borkar V (2002) Stochastic approximation for non-expansive maps: application to Q-learning algorithms. SIAM J Control Optim 41(1):1–22
Barto AG, Bradtke SJ, Singh SP (1995) Learning to act using real-time dynamic programming. Artif Intell 72(1):81–138
Bertsekas DP (1995) Dynamic programming and optimal control. Athena Scientific, Belmont
Brafman RI, Tennenholtz M (2002) R-MAX – a general polynomial time algorithm for near-optimal reinforcement learning. J Mach Learn Res 2:213–231
Crites RH, Barto AG (1998) Elevator group control using multiple reinforcement agents. Mach Learn 33(2/3):235–262
Ghavamzadeh M, Mahadevan S (2006) Hierarchical average reward reinforcement learning. J Mach Learn Res 13(2):197–229
Kearns M, Singh S (2002) Near-optimal reinforcement learning in polynomial time. Mach Learn 49(2/3):209–232
Mahadevan S (1996) Average reward reinforcement learning: foundations, algorithms, and empirical results. Mach Learn 22(1/2/3):159–195
Marbach P, Mihatsch O, Tsitsiklis JN (2000) Call admission control and routing in integrated service networks using neuro-dynamic programming. IEEE J Sel Areas Commun 18(2): 197–208
Proper S, Tadepalli P (2006) Scaling model-based average-reward reinforcement learning for product delivery. In: European conference on machine learning, Berlin. Springer, pp 725–742
Puterman ML (1994) Markov decision processes: discrete dynamic stochastic programming. Wiley, New York
Schwartz A (1993) A reinforcement learning method for maximizing undiscounted rewards. In: Proceedings of the tenth international conference on machine learning, Amherst. Morgan Kaufmann, San Mateo, pp 298–305
Seri S, Tadepalli P (2002) Model-based hierarchical average-reward reinforcement learning. In: Proceedings of international machine learning conference, Sydney. Morgan Kaufmann, pp 562–569
Sutton R, Barto A (1998) Reinforcement learning: an introduction. MIT, Cambridge
Tadepalli P, Ok D (1998) Model-based average-reward reinforcement learning. Artif Intell 100:177–224
Tesauro G (1992) Practical issues in temporal difference learning. Mach Learn 8(3–4):257–277
Tsitsiklis J, Van Roy B (1999) Average cost temporal-difference learning. Automatica 35(11):1799–1808
Van Roy B, Tsitsiklis J (2002) On average versus discounted temporal-difference learning. Mach Learn 49(2/3):179–191
Wang G, Mahadevan S (1999) Hierarchical optimization of policy-coupled semi-Markov decision processes. In: Proceedings of the 16th international conference on machine learning, Bled, pp 464–473
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer Science+Business Media New York
About this entry
Cite this entry
Tadepalli, P. (2014). Average-Reward Reinforcement Learning. In: Sammut, C., Webb, G. (eds) Encyclopedia of Machine Learning and Data Mining. Springer, Boston, MA. https://doi.org/10.1007/978-1-4899-7502-7_17-1
Download citation
DOI: https://doi.org/10.1007/978-1-4899-7502-7_17-1
Received:
Accepted:
Published:
Publisher Name: Springer, Boston, MA
Online ISBN: 978-1-4899-7502-7
eBook Packages: Springer Reference Computer SciencesReference Module Computer Science and Engineering