Abstract
We propose the first hardware architecture for an action-selection Policy Generator feasible for Reinforcement Learning hardware accelerators. The system is capable of producing outputs for random, greedy and ε-greedy action-selection policies within the same circuit. It requires a very moderate number of hardware resources, shows a limited power dissipation, and can be integrated in the state of the art of Reinforcement Learning hardware accelerators due to its high computational speed. Our architecture is meant to work with Q-Matrix based Reinforcement Learning algorithms such as Q-Learning and SARSA.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Giuliano, R., et al.: Indoor localization system based on bluetooth low energy for museum applications. Electronics (Switzerland) 9(6), 1–20 (2020). art. no. 1055
Capizzi, G., et al.: Small lung nodules detection based on fuzzy-logic and probabilistic neural network with bioinspired reinforcement learning. IEEE Trans. Fuzzy Syst. 28(6), 1178–1189 (2020). art. no. 8895990
Napoli, C., Bonanno, F., Capizzi, G.: An hybrid neuro-wavelet approach for long-term prediction of solar wind. Proc. Int. Astron. Union 6(S274), 153–155 (2010)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT press, Cambridge (2018)
Lin, J.L., et al.: Gait balance and acceleration of a biped robot based on Q-learning. IEEE Access 4, 2439–2449 (2016)
Matta, M., et al.: A reinforcement learning-based QAM/PSK symbol synchronizer. IEEE Access 7, 124147–124157 (2019)
Zhu, J., et al.: A new deep-Q-learning-based transmission scheduling mechanism for the cognitive Internet of Things. IEEE Internet Things J. 5(4), 2375–2385 (2017)
Samadi, E., Badri, A., Ebrahimpour, R.: Decentralized multi-agent based energy management of microgrid using reinforcement learning. Int. J. Electr. Power Energy Syst. 122, 106211 (2020)
Matta, M., et al.: Q-RTS: a real-time swarm intelligence based on multi-agent Q-learning. Electron. Lett. 55(10), 589–591 (2019)
Watkins, C.J.C.H., Dayan, P.: Q-learning. Mach. Learn. 8(3–4), 279-292 (1992)
Rummery, G.A., Niranjan, M.: On-line Q-learning using connectionist systems, vol. 37. University of Cambridge, Department of Engineering, Cambridge (1994)
Da Silva, L.M., Torquato, M.F., Fernandes, M.A.: Parallel implementation of reinforcement learning Q-learning technique for FPGA. IEEE Access 7, 2782–2798 (2018)
Rajat, R., et al.: Qtaccel: a generic fpga based design for q-table based reinforcement learning accelerators. In: The 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (2020)
Spanò, S., et al.: An efficient hardware implementation of reinforcement learning: the Q-learning algorithm. IEEE Access 7, 186340–186351(2019)
Tijsma, A.D., Drugan, M.M., Wiering, M.A.: Comparing exploration strategies for Q-learning in random stochastic mazes. In: 2016 IEEE Symposium Series on Computational Intelligence (SSCI). IEEE (2016)
Acknowledgments
The authors would like to thank Xilinx Inc. for providing FPGA hardware and software tools by Xilinx University Program.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Cardarilli, G.C. et al. (2021). An Action-Selection Policy Generator for Reinforcement Learning Hardware Accelerators. In: Saponara, S., De Gloria, A. (eds) Applications in Electronics Pervading Industry, Environment and Society. ApplePies 2020. Lecture Notes in Electrical Engineering, vol 738. Springer, Cham. https://doi.org/10.1007/978-3-030-66729-0_32
Download citation
DOI: https://doi.org/10.1007/978-3-030-66729-0_32
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-66728-3
Online ISBN: 978-3-030-66729-0
eBook Packages: EngineeringEngineering (R0)