Skip to main content

A Review of Reinforcement Learning Methods

  • Chapter
  • First Online:
Data Mining and Knowledge Discovery Handbook

Summary

Reinforcement-Learning is learning how to best-react to situations, through trial and error. In the Machine-Learning community Reinforcement-Learning is researched with respect to artificial (machine) decision-makers, referred to as agents. The agents are assumed to be situated within an environment which behaves as a Markov Decision Process. This chapter provides a brief introduction to Reinforcement-Learning, and establishes its relation to Data-Mining. Specifically, the Reinforcement-Learning problem is defined; a few key ideas for solving it are described; the relevance to Data-Mining is explained; and an instructive example is presented.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 349.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Arbel, R. and Rokach, L., Classifier evaluation under limited resources, Pattern Recognition Letters, 27(14): 1619–1631, 2006, Elsevier.

    Article  Google Scholar 

  • Averbuch, M. and Karson, T. and Ben-Ami, B. and Maimon, O. and Rokach, L., Contextsensitive medical information retrieval, The 11th World Congress on Medical Informatics (MEDINFO 2004), San Francisco, CA, September 2004, IOS Press, pp. 282–286.

    Google Scholar 

  • Bellman R. Dynamic Programming. Princeton University Press, 1957.

    Google Scholar 

  • Bertsekas D.P. Dynamic Programming: Deterministic and Stochastic Models. Prentice-Hall, 1987.

    Google Scholar 

  • Bertsekas D.P., Tsitsiklis J.N. Neuro-Dynamic Programming. Athena Scientific, 1996.

    Google Scholar 

  • Claus C., Boutilier, C. The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems. AAAI-97 Workshop on Multiagent Learning, 1998.

    Google Scholar 

  • Cohen S., Rokach L., Maimon O., Decision Tree Instance Space Decomposition with Grouped Gain-Ratio, Information Science, Volume 177, Issue 17, pp. 3592-3612, 2007.

    Article  Google Scholar 

  • Crites R.H., Barto A.G. Improving Elevator Performance Using Reinforcement Learning. Advances in Neural Information Processing Systems: Proceedings of the 1995 Conference, 1996.

    Google Scholar 

  • Filar J., Vriez K. Competitive Markov Decision Processes. Springer, 1997.

    Google Scholar 

  • Hong J, Prabhu V.V. Distributed Reinforcement Learning for Batch Sequencing and Sizing in Just-In-Time Manufacturing Systems. Applied Intelligence, 2004; 20:71-87.

    Google Scholar 

  • Howard, R.A. Dynamic Programming and Markov Processes, M.I.T Press, 1960.

    Google Scholar 

  • Hu J.,Wellman M.P. Multiagent Reinforcement Learning: Theoretical Framework and Algorithm. In Proceedings of the 15th International Conference on Machine Learning, 1998.

    Google Scholar 

  • Jaakkola T., Jordan M.I., Singh S.P. On the Convergence of Stochastic Iterative Dynamic Programming Algorithms. Neural Computation, 1994; 6:1185-201.

    Article  MATH  Google Scholar 

  • Kaelbling L.P., Littman L.M., Moore A.W. Reinforcement Learning: a Survey. Journal of Artificial Intelligence Research 1996; 4:237-85.

    Google Scholar 

  • Littman M.L., Boyan J.A. A Distributed Reinforcement Learning Scheme for Network Routing. In Proceedings of the International Workshop on Applications of Neural Networks to Telecommunications, 1993.

    Google Scholar 

  • Littman M.L. Markov Games as a Framework for Multi-Agent Reinforcement Learning. In Proceedings of the 7th International Conference on Machine Learning, 1994.

    Google Scholar 

  • Littman M. L. Friend-or-Foe Q-Learning in General-Sum Games. Proceedings of the 18th International Conference on Machine Learning, 2001.

    Google Scholar 

  • Maimon O., and Rokach, L. Data Mining by Attribute Decomposition with semiconductors manufacturing case study, in Data Mining for Design and Manufacturing: Methods and Applications, D. Braha (ed.), Kluwer Academic Publishers, pp. 311–336, 2001.

    Google Scholar 

  • Maimon O. and Rokach L., “Improving supervised learning by feature decomposition”, Proceedings of the Second International Symposium on Foundations of Information and Knowledge Systems, Lecture Notes in Computer Science, Springer, pp. 178-196, 2002.

    Google Scholar 

  • Maimon, O. and Rokach, L., Decomposition Methodology for Knowledge Discovery and Data Mining: Theory and Applications, Series in Machine Perception and Artificial Intelligence - Vol. 61, World Scientific Publishing, ISBN:981-256-079-3, 2005.

    Google Scholar 

  • Moskovitch R, Elovici Y, Rokach L, Detection of unknown computer worms based on behavioral classification of the host, Computational Statistics and Data Analysis, 52(9): 4544– 4566, 2008.

    Article  MATH  MathSciNet  Google Scholar 

  • Pednault E., Abe N., Zadrozny B. Sequential Cost-Sensitive Decision making with Reinforcement-Learning. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2002.

    Google Scholar 

  • Puterman M.L. Markov Decision Processes. Wiley, 1994

    Google Scholar 

  • Rokach, L., Decomposition methodology for classification tasks: a meta decomposer framework, Pattern Analysis and Applications, 9(2006):257–271.

    Article  MathSciNet  Google Scholar 

  • Rokach L., Genetic algorithm-based feature set partitioning for classification problems, Pattern Recognition, 41(5):1676–1700, 2008.

    Article  MATH  Google Scholar 

  • Rokach L., Mining manufacturing data using genetic algorithm-based feature set decomposition, Int. J. Intelligent Systems Technologies and Applications, 4(1):57-78, 2008.

    Article  Google Scholar 

  • Rokach, L. and Maimon, O., Theory and applications of attribute decomposition, IEEE International Conference on Data Mining, IEEE Computer Society Press, pp. 473–480, 2001.

    Google Scholar 

  • Rokach L. and Maimon O., Feature Set Decomposition for Decision Trees, Journal of Intelligent Data Analysis, Volume 9, Number 2, 2005b, pp 131–158.

    Google Scholar 

  • Rokach, L. and Maimon, O., Clustering methods, Data Mining and Knowledge Discovery Handbook, pp. 321–352, 2005, Springer.

    Google Scholar 

  • Rokach, L. and Maimon, O., Data mining for improving the quality of manufacturing: a feature set decomposition approach, Journal of Intelligent Manufacturing, 17(3):285–299, 2006, Springer.

    Article  Google Scholar 

  • Rokach, L., Maimon, O., Data Mining with Decision Trees: Theory and Applications,World Scientific Publishing, 2008.

    Google Scholar 

  • Rokach L., Maimon O. and Lavi I., Space Decomposition In Data Mining: A Clustering Approach, Proceedings of the 14th International Symposium On Methodologies For Intelligent Systems, Maebashi, Japan, Lecture Notes in Computer Science, Springer-Verlag, 2003, pp. 24–31.

    Google Scholar 

  • Rokach, L. and Maimon, O. and Averbuch, M., Information Retrieval System for Medical Narrative Reports, Lecture Notes in Artificial intelligence 3055, page 217-228 Springer- Verlag, 2004.

    Google Scholar 

  • Rokach, L. and Maimon, O. and Arbel, R., Selective voting-getting more for less in sensor fusion, International Journal of Pattern Recognition and Artificial Intelligence 20(3) (2006), pp. 329–350.

    Article  Google Scholar 

  • Ross S. Introduction to Stochastic Dynamic Programming. Academic Press. 1983.

    Google Scholar 

  • Sen S., Sekaran M., Hale J. Learning to Coordinate Without Sharing Information. In Proceedings of the Twelfth National Conference on Artificial Intelligence, 1994.

    Google Scholar 

  • Sutton R.S., Barto A.G. Reinforcement Learning, an Introduction. MIT Press, 1998.

    Google Scholar 

  • Szepesvári C., Littman M.L. A Unified Analysis of Value-Function-Based Reinforcement-Learning Algorithms. Neural Computation, 1999; 11:2017-60.

    Article  Google Scholar 

  • Tesauro G.T. TD-Gammon, a Self Teaching Backgammon Program, Achieves Master Level Play. Neural Computation, 1994; 6:215-19.

    Article  Google Scholar 

  • Tesauro G.T. Temporal Difference Learning and TD-Gammon. Communications of the ACM, 1995; 38:58-68.

    Article  Google Scholar 

  • Watkins C.J.C.H. Learning from Delayed Rewards. Ph.D. thesis; Cambridge University, 1989.

    Google Scholar 

  • Watkins C.J.C.H., Dayan P. Technical Note: Q-Learning. Machine Learning, 1992; 8:279-92.

    MATH  Google Scholar 

  • Zhang W., Dietterich T.G. High Performance Job-Shop Scheduling With a Time Delay TD(λ ) Network. Advances in Neural Information Processing Systems, 1996; 8:1024-30.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Oded Maimon .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Maimon, O., Cohen, S. (2009). A Review of Reinforcement Learning Methods. In: Maimon, O., Rokach, L. (eds) Data Mining and Knowledge Discovery Handbook. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-09823-4_20

Download citation

  • DOI: https://doi.org/10.1007/978-0-387-09823-4_20

  • Published:

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-0-387-09822-7

  • Online ISBN: 978-0-387-09823-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics