Machine Learning

, Volume 84, Issue 1–2, pp 7–49 | Cite as

On the analysis and design of software for reinforcement learning, with a survey of existing systems

Article

Abstract

Reinforcement Learning (RL) is a very complex domain and software for RL is correspondingly complex. We analyse the scope, requirements, and potential for RL software, discuss relevant design issues, survey existing software, and make recommendations for designers. We argue that broad and flexible libraries of reusable software components are valuable from a scientific, as well as practical, perspective, as they allow precise control over experimental conditions, encourage comparison of alternative methods, and allow a fuller exploration of the RL domain.

Keywords

Reinforcement learning Software engineering 

References

  1. Aberdeen, D., Buffet, O., Selmi-Dei, F. P., Zhang, X., & Lopes, T. (2006). libpgrl. http://code.google.com/p/libpgrl/.
  2. Asuncion, A., & Newman, D. (2010). UCI machine learning repository. http://www.ics.uci.edu/~mlearn/MLRepository.html.
  3. Barnes, D. J., & Kölling, M. (2008). Objects first with Java—a practical introduction using BlueJ (4th ed.). New York: Prentice Hall/Pearson Education. Google Scholar
  4. Ben-Kiki, O., Evans, C., & döt Net, I. (2009). YAML Ain’t markup language (YAML) Version 1.2. http://www.yaml.org/spec/1.2/spec.html.
  5. Bouckaert, R. R., Frank, E., Hall, M. A., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2010). WEKA-experiences with a Java open-source project. Journal of Machine Learning Research, 11, 2533–2541. Google Scholar
  6. Chadès, I., Cros, M. J., Garcia, F., & Sabbadin, R. (2009). Markov Decision Processes (MDP) Toolbox. http://www.inra.fr/mia/T/MDPtoolbox/.
  7. Collobert, R., Bengio, S., & Mariéthoz, J. (2002). Torch: a modular machine learning software library (Tech. Rep. IDIAP-RR 02-46). IDIAP. Google Scholar
  8. Comité, F. D., & Delepoulle, S. (2005). PIQLE: a platform for implementation of Q-learning experiments. In NIPS workshop: reinforcement learning benchmarks and bake-offs II. Google Scholar
  9. Drummond, C. (2009). Replicability is not reproducibility: nor is it good science. In Proceedings of the twenty-sixth international conference on machine learning: workshop on evaluation methods for machine learning IV. Google Scholar
  10. Edgington, M. (2009). Maja machine learning framework. http://mmlf.sourceforge.net/.
  11. Gamma, R., Helm, R., Johnson, R., & Vlissides, J. (1995). Design patterns. Elements of reusable object-oriented software. Reading: Addison-Wesley. Google Scholar
  12. Gasser, M. (2008). Smarts 2.0. http://www.cs.indiana.edu/~gasser/Smarts/.
  13. Grand, M. (2002). Patterns in Java (Vol. 1, 2nd ed.). New York: Wiley. Google Scholar
  14. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The WEKA data mining software: an update. SIGKDD Explorations, 11(1), 233–234. CrossRefGoogle Scholar
  15. Johnston, W. M., Hanna, J. R. P., & Millar, R. J. (2004). Advances in dataflow programming languages. ACM Computing Surveys, 36(1), 1–34. doi:10.1145/1013208.1013209. CrossRefGoogle Scholar
  16. Juergens, E., Deissenboeck, F., Hummel, B., & Wagner, S. (2009). Do code clones matter? In ICSE ’09: Proceedings of the 31st international conference on software engineering (pp. 485–495). New York: IEEE Computer Society. Google Scholar
  17. Kapusta, D. (2005). Connectionist Q-learning Java framework. http://elsy.gdan.pl/.
  18. Kerr, A. J., Neller, T. W., Pilla, C. J. L., & Schompert, M. D. (2003). Java resources for teaching reinforcement learning. In Proceedings of the international conference on parallel and distributed processing techniques and applications (pp. 1497–1501). Google Scholar
  19. Koschke, R. (2007). Survey of research on software clones. In Duplication, redundancy, and similarity in software, Dagstuhl seminar proceedings. Google Scholar
  20. Kovacs, T. (2004). Strength or accuracy: credit assignment in learning classifier systems. Berlin: Springer. http://www.cs.bris.ac.uk/~kovacs/author.directory/thesis/thesis.html. MATHCrossRefGoogle Scholar
  21. Kovacs, T., & Kerber, M. (2006). A study of structural and parametric learning in XCS. Evolutionary Computation, 14(1), 1–19. CrossRefGoogle Scholar
  22. McPhillips, T., Bowers, S., Zinn, D., & Ludäscher, B. (2009). Scientific workflow design for mere mortals. Future Generations Computer Systems, 25(5), 541–551. doi:10.1016/j.future.2008.06.013. CrossRefGoogle Scholar
  23. Mierswa, I., Wurst, M., Klinkenberg, R., Scholz, M., & Euler, T. (2006). YALE: Rapid prototyping for complex data mining tasks. In L. Ungar, M. Craven, D. Gunopulos, & T. Eliassi-Rad (Eds.), KDD ’06: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 935–940). New York: ACM. doi:10.1145/1150402.1150531, http://rapid-i.com/component/option,com_docman/task,doc_download/gid,25/Itemid.62/. CrossRefGoogle Scholar
  24. Mitchell, T. (1997). Machine learning. New York: McGraw Hill. MATHGoogle Scholar
  25. Moler, C., Little, J., & Bangert, S. (1987). Pro-Matlab user’s guide. The Mathworks, Cochituate Place, 24 Prime Park Way, Natick, MA, USA. Google Scholar
  26. Neumann, G. (2005). Reinforcement learning for optimal control tasks. Master’s thesis, Technischen Universitat, Graz. Google Scholar
  27. Riedmiller, M., Lange, S., Timmer, S., & Hafner, R. (2006). CLSquare. http://www.ni.uos.de/index.php?id=70.
  28. Roberts, E. (2004). Resources to support the use of Java in introductory computer science. ACM SIGCSE Bulletin, 36(1), 233–234. CrossRefGoogle Scholar
  29. Roy, C. K., & Cordy, J. R. (2007). A survey on software clone detection research (Tech. Rep. 2007-541). Queen’s University. Google Scholar
  30. Schaffer, C. (1994). A conservation law for generalization performance. In H. Hirsh & W. W. Cohen (Eds.), Machine learning: proceedings of the eleventh international conference (pp. 259–265). San Francisco: Morgan Kaufmann. Google Scholar
  31. Schaul, T., Bayer, J., Wierstra, D., Sun, Y., Felder, M., Sehnke, F., Rückstieß, T., & Schmidhuber, J. (2010). PyBrain. Journal of Machine Learning Research, 11, 743–746. Google Scholar
  32. Shalloway, A., & Trott, J. R. (2005). Design patterns explained. A new perspective on object-oriented design (2nd ed.). Upper Saddle River: Pearson Education. Google Scholar
  33. Sloman, A. (1994). Explorations in design space. In A. G. Cohn (Ed.), Proceedings European conference on artificial intelligence 1994 (pp. 578–582). New York: Wiley. http://www.cs.bham.ac.uk/~axs. Google Scholar
  34. Sonnenburg, S., Braun, M. L., Ong, C. S., Bengio, S., Bottou, L., Holmes, G., LeCun, Y., Müller, K. R., Pereira, F., Rasmussen, C. E., Rätsch, G., Schölkopf, B., Smola, A., Vincent, P., Weston, J., & Williamson, R. C. (2007). The need for open source software in machine learning. Journal of Machine Learning Research, 8, 2443–2466. Google Scholar
  35. Stodden, V. (2010). The scientific method in practice: reproducibility in the computational sciences (Tech. Rep. 4773-10). MIT Sloan School of Management. Google Scholar
  36. Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: an introduction. Cambridge: MIT Press. Google Scholar
  37. Tanner, B. (2009a). Project details for RL-Glue and Codecs on mloss.org. http://mloss.org/software/view/151/.
  38. Tanner, B. (2009b). RL-Logbook. http://logbook.rl-community.org/.
  39. Tanner, B. (2010). RL-Library. http://library.rl-community.org/.
  40. Tanner, B., & White, A. (2009). RL-Glue: language-independent software for reinforcement-learning experiments. Journal of Machine Learning Research, 10, 2133–2136. Google Scholar
  41. Vanschoren, J. (2010). Understanding machine learning performance with experiment databases. PhD thesis, Katholieke Universiteit Leuven. Google Scholar
  42. Whiteson, S., Tanner, B., & White, A. (2010). The reinforcement learning competitions. AI Magazine, 31(2), 81–94. Google Scholar
  43. Witten, I. H., & Frank, E. (2005). Data mining: practical machine learning tools and techniques (2nd ed.). Amsterdam: Elsevier. MATHGoogle Scholar
  44. Wolpert, D. H. (1996). The lack of a priori distinctions between learning algorithms. Neural Computation, 8(7), 1341–1390. CrossRefGoogle Scholar
  45. Zito, T., Wilbert, N., Wiskott, L., & Berkes, P. (2008). Modular toolkit for Data Processing (MDP): a Python data processing framework. In Frontiers in neuroinformatics (Vol. 2). Google Scholar

Copyright information

© The Author(s) 2011

Authors and Affiliations

  1. 1.Intelligent Systems LaboratoryUniversity of BristolBristolUK

Personalised recommendations