Skip to main content

An Action-Selection Policy Generator for Reinforcement Learning Hardware Accelerators

  • Conference paper
  • First Online:
Applications in Electronics Pervading Industry, Environment and Society (ApplePies 2020)

Abstract

We propose the first hardware architecture for an action-selection Policy Generator feasible for Reinforcement Learning hardware accelerators. The system is capable of producing outputs for random, greedy and ε-greedy action-selection policies within the same circuit. It requires a very moderate number of hardware resources, shows a limited power dissipation, and can be integrated in the state of the art of Reinforcement Learning hardware accelerators due to its high computational speed. Our architecture is meant to work with Q-Matrix based Reinforcement Learning algorithms such as Q-Learning and SARSA.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Giuliano, R., et al.: Indoor localization system based on bluetooth low energy for museum applications. Electronics (Switzerland) 9(6), 1–20 (2020). art. no. 1055

    Google Scholar 

  2. Capizzi, G., et al.: Small lung nodules detection based on fuzzy-logic and probabilistic neural network with bioinspired reinforcement learning. IEEE Trans. Fuzzy Syst. 28(6), 1178–1189 (2020). art. no. 8895990

    Google Scholar 

  3. Napoli, C., Bonanno, F., Capizzi, G.: An hybrid neuro-wavelet approach for long-term prediction of solar wind. Proc. Int. Astron. Union 6(S274), 153–155 (2010)

    Google Scholar 

  4. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT press, Cambridge (2018)

    Google Scholar 

  5. Lin, J.L., et al.: Gait balance and acceleration of a biped robot based on Q-learning. IEEE Access 4, 2439–2449 (2016)

    Google Scholar 

  6. Matta, M., et al.: A reinforcement learning-based QAM/PSK symbol synchronizer. IEEE Access 7, 124147–124157 (2019)

    Google Scholar 

  7. Zhu, J., et al.: A new deep-Q-learning-based transmission scheduling mechanism for the cognitive Internet of Things. IEEE Internet Things J. 5(4), 2375–2385 (2017)

    Google Scholar 

  8. Samadi, E., Badri, A., Ebrahimpour, R.: Decentralized multi-agent based energy management of microgrid using reinforcement learning. Int. J. Electr. Power Energy Syst. 122, 106211 (2020)

    Google Scholar 

  9. Matta, M., et al.: Q-RTS: a real-time swarm intelligence based on multi-agent Q-learning. Electron. Lett. 55(10), 589–591 (2019)

    Google Scholar 

  10. Watkins, C.J.C.H., Dayan, P.: Q-learning. Mach. Learn. 8(3–4), 279-292 (1992)

    Google Scholar 

  11. Rummery, G.A., Niranjan, M.: On-line Q-learning using connectionist systems, vol. 37. University of Cambridge, Department of Engineering, Cambridge (1994)

    Google Scholar 

  12. Da Silva, L.M., Torquato, M.F., Fernandes, M.A.: Parallel implementation of reinforcement learning Q-learning technique for FPGA. IEEE Access 7, 2782–2798 (2018)

    Google Scholar 

  13. Rajat, R., et al.: Qtaccel: a generic fpga based design for q-table based reinforcement learning accelerators. In: The 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (2020)

    Google Scholar 

  14. Spanò, S., et al.: An efficient hardware implementation of reinforcement learning: the Q-learning algorithm. IEEE Access 7, 186340–186351(2019)

    Google Scholar 

  15. Tijsma, A.D., Drugan, M.M., Wiering, M.A.: Comparing exploration strategies for Q-learning in random stochastic mazes. In: 2016 IEEE Symposium Series on Computational Intelligence (SSCI). IEEE (2016)

    Google Scholar 

Download references

Acknowledgments

The authors would like to thank Xilinx Inc. for providing FPGA hardware and software tools by Xilinx University Program.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sergio Spanò .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Cardarilli, G.C. et al. (2021). An Action-Selection Policy Generator for Reinforcement Learning Hardware Accelerators. In: Saponara, S., De Gloria, A. (eds) Applications in Electronics Pervading Industry, Environment and Society. ApplePies 2020. Lecture Notes in Electrical Engineering, vol 738. Springer, Cham. https://doi.org/10.1007/978-3-030-66729-0_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-66729-0_32

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-66728-3

  • Online ISBN: 978-3-030-66729-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics