MapReduce for Parallel Reinforcement Learning

  • Yuxi Li
  • Dale Schuurmans
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7188)


We investigate the parallelization of reinforcement learning algorithms using MapReduce, a popular parallel computing framework. We present parallel versions of several dynamic programming algorithms, including policy evaluation, policy iteration, and off-policy updates. Furthermore, we design parallel reinforcement learning algorithms to deal with large scale problems using linear function approximation, including model-based projection, least squares policy iteration, temporal difference learning and recent gradient temporal difference learning algorithms. We give time and space complexity analysis of the proposed algorithms. This study demonstrates how parallelization opens new avenues for solving large scale reinforcement learning problems.


Nonnegative Matrix Factorization Policy Iteration Reinforcement Learning Algorithm Policy Improvement Eligibility Trace 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Athena Scientific, Massachusetts (1996)zbMATHGoogle Scholar
  2. 2.
    Bertsekas, D.P., Tsitsiklis, J.N.: Parallel and Distributed Computation: Numerical Methods. Athena Scientific, Massachusetts (1997)Google Scholar
  3. 3.
    Bu, Y., Howe, B., Balazinska, M., Ernst, M.D.: Haloop: Efficient iterative data processing on large clusters. In: The 36th International Conference on Very Large Data Bases (VLDB 2010), Singapore (September 2010)Google Scholar
  4. 4.
    Chu, C.-T., Kim, S.K., Lin, Y.-A., Yu, Y., Bradski, G., Ng, A., Olukotun, K.: Map-reduce for machine learning on multicore. In: Advances in Neural Information Processing Systems 19 (NIPS 2006), pp. 281–288 (December 2006)Google Scholar
  5. 5.
    Dean, J., Ghemawat, S.: Mapreduce: Simplied data processing on large clusters. In: OSDI 2004, San Francisco, USA, pp. 137–150 (December 2004)Google Scholar
  6. 6.
    Ekanayake, J., Li, H., Zhang, B., Gunarathne, T., Bae, S.-H., Qiu, J., Fox, G.: Twister: A runtime for iterative mapreduce. In: The First International Workshop on MapReduce and its Applications, Chicago, USA (June 2010)Google Scholar
  7. 7.
    Golub, G.H., Van Loan, C.F.: Matrix Computations. The Johns Hopkins University Press, Baltimore (1996)zbMATHGoogle Scholar
  8. 8.
    Kung, U., Tsourakakis, C.E., Faloutsos, C.: Pegasus: A peta-scale graph mining system - implementation and observations. In: ICDM 2009, Miami, pp. 229–238 (December 2009)Google Scholar
  9. 9.
    Lagoudakis, M.G., Parr, R.: Least-squares policy iteration. The Journal of Machine Learning Research 4, 1107–1149 (2003)MathSciNetGoogle Scholar
  10. 10.
    Lin, J., Dyer, C.: Data-Intensive Text Processing with MapReduce. Morgan & Claypool (2009)Google Scholar
  11. 11.
    Liu, C., Yang, H.-C., Fan, J., He, L.-W., Wang, Y.-M.: Distributed nonnegative matrix factorization for web-scale dyadic data analysis on mapreduce. In: Proceedings of the 19th International World Wide Web Conference (WWW 2010), Raleigh, North Carolina, USA, April 26-30, pp. 681–690 (2010)Google Scholar
  12. 12.
    Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., Hellerstein, J.M.: Graphlab: A new parallel framework for machine learning. In: Uncertainty in Artificial Intelligence (UAI), Catalina Island, USA (July 2010)Google Scholar
  13. 13.
    Maei, H.R., Sutton, R.S.: GQ(λ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces. In: Proceedings of the Third Conference on Artificial General Intelligence, Lugano, Switzerland (2010)Google Scholar
  14. 14.
    Puterman, M.L.: Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, New York (1994)zbMATHGoogle Scholar
  15. 15.
    Silver, D., Sutton, R.S., Muller, M.: Reinforcement learning of local shape in the game of Go. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI), Hyderabad, India, pp. 1053–1058 (January 2007)Google Scholar
  16. 16.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press (1998)Google Scholar
  17. 17.
    Sutton, R.S., Maei, H.R., Precup, D., Bhatnagar, S., Silver, D., Szepesvari, C., Wiewiora, E.: Fast gradient-descent methods for temporal-difference learning with linear function approximation. In: Proceedings of ICML 2009, Montreal, Canada, pp. 993–1000 (June 2009)Google Scholar
  18. 18.
    Szepesvari, C.: Algorithms for Reinforcement Learning. Morgan Kaufmann & Claypool (2010)Google Scholar
  19. 19.
    Tsitsiklis, J.N., Van Roy, B.: Regression methods for pricing complex American-style options. IEEE Transactions on Neural Networks (Special Issue on Computational Finance) 12(4), 694–703 (2001)Google Scholar
  20. 20.
    White, T.: Hadoop: The Definitive Guide. O’Reilly (2009)Google Scholar
  21. 21.
    Zinkevich, M., Weimer, M., Smola, A., Li, L.: Parallelized stochastic gradient descent. In: Proceedings of Advances in Neural Information Processing Systems 24 (NIPS 2010), Vancouver, Canada, pp. 2217–2225 (December 2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Yuxi Li
    • 1
  • Dale Schuurmans
    • 2
  1. 1.College of Computer Science and EngineeringUniv. of Electronic Science and Technology of ChinaChengduChina
  2. 2.Department of Computing ScienceUniversity of AlbertaEdmontonCanada

Personalised recommendations