Skip to main content

Policy Learning Using SPSA

Part of the Lecture Notes in Computer Science book series (LNTCS,volume 11141)


We analyze the use of simultaneous perturbation stochastic approximation (SPSA), a stochastic optimization technique, for solving reinforcement learning problems. In particular, we consider settings of partial observability and leverage the short-term memory capabilities of echo state networks (ESNs) to learn parameterized control policies. Using SPSA, we propose three different variants to adapt the weight matrices of an ESN to the task at hand. Experimental results on classic control problems with both discrete and continuous action spaces reveal that ESNs trained using SPSA approaches outperform conventional ESNs trained using temporal difference and policy gradient methods.


  • Echo state networks
  • Recurrent neural networks
  • Reinforcement learning
  • Stochastic optimization

This is a preview of subscription content, access via your institution.

Buying options

USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-01424-7_1
  • Chapter length: 10 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
USD   89.00
Price excludes VAT (USA)
  • ISBN: 978-3-030-01424-7
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   119.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.


  1. Bertsekas, D.P.: Neuro-Dynamic Programming. Athena Scientific, Belmont (1996)

    MATH  Google Scholar 

  2. Brockman, G., et al.: OpenAI Gym. arXiv:1606.01540 (2016)

  3. Chatzidimitriou, K.C., Mitkas, P.A.: A NEAT way for evolving echo state networks. In: Proceedings of European Conference on Artificial Intelligence (2010)

    Google Scholar 

  4. Duan, Y., Chen, X., Houthooft, R., Schulman, J., Abbeel, P.: Benchmarking deep reinforcement learning for continuous control. In: Proceedings of International Conference on Machine Learning (2016)

    Google Scholar 

  5. Jäger, H.: The “echo state” approach to analysing and training recurrent neural networks. Technical report 148, GMD (2001)

    Google Scholar 

  6. Jiang, F., Berry, H., Schoenauer, M.: Supervised and evolutionary learning of echo state networks. In: Proceedings of International Conference on Parallel Problem Solving from Nature (2008)

    Google Scholar 

  7. Koprinkova-Hristova, P.: Three approaches to train echo state network actors of adaptive critic design. In: Proceeding of International Conference on Artificial Neural Networks (2016)

    Google Scholar 

  8. Lin, L.J.: Reinforcement learning for robots using neural networks. Technical reports CMU-CS-93-103, Carnegie-Mellon University (1993)

    Google Scholar 

  9. Lukoševičius, M.: A practical guide to applying echo state networks. In: Montavon, G., Orr, G.B., Müller, K.-R. (eds.) Neural Networks: Tricks of the Trade. LNCS, vol. 7700, pp. 659–686. Springer, Heidelberg (2012).

    CrossRef  Google Scholar 

  10. Mania, H., Guy, A., Recht, B.: Simple random search provides a competitive approach to reinforcement learning. arXiv:1803.07055 (2018)

  11. Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: Proceedings of International Conference on Machine Learning (2016)

    Google Scholar 

  12. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)

    CrossRef  Google Scholar 

  13. Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22(3), 400–407 (1951)

    MathSciNet  CrossRef  Google Scholar 

  14. Salimans, T., Ho, J., Chen, X., Sutskever, I.: Evolution strategies as a scalable alternative to reinforcement learning. arXiv:1703.03864 (2017)

  15. Schmidhuber, J., Wierstra, D., Gagliolo, M., Gomez, F.: Training recurrent networks by Evolino. Neural Comput. 19(3), 757–779 (2007)

    CrossRef  Google Scholar 

  16. Schrauwen, B., Wardermann, M., Verstraeten, D., Steil, J.J., Stroobandt, D.: Improving reservoirs using intrinsic plasticity. Neurocomputing 71(7–9), 1159–1171 (2008)

    CrossRef  Google Scholar 

  17. Silver, D., et al.: Mastering the game of Go without human knowledge. Nature 550(7676), 354 (2017)

    CrossRef  Google Scholar 

  18. Spall, J.C.: Multivariate stochastic approximation using a simultaneous perturbation gradient approximation. IEEE Trans. Autom. Control. 37(3), 332–341 (1992)

    MathSciNet  CrossRef  Google Scholar 

  19. Such, F.P., Madhavan, V., Conti, E., Lehman, J., Stanley, K.O., Clune, J.: Deep neuroevolution: genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning. arXiv:1712.06567 (2017)

  20. Sutton, R.: Introduction to reinforcement learning with function approximation. In: Tutorial at the Conference on Neural Information Processing Systems (2015)

    Google Scholar 

  21. Sutton, R.S., Barto, A.G., et al.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to R. Ramamurthy .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Ramamurthy, R., Bauckhage, C., Sifa, R., Wrobel, S. (2018). Policy Learning Using SPSA. In: Kůrková, V., Manolopoulos, Y., Hammer, B., Iliadis, L., Maglogiannis, I. (eds) Artificial Neural Networks and Machine Learning – ICANN 2018. ICANN 2018. Lecture Notes in Computer Science(), vol 11141. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-01423-0

  • Online ISBN: 978-3-030-01424-7

  • eBook Packages: Computer ScienceComputer Science (R0)