An Action-Selection Policy Generator for Reinforcement Learning Hardware Accelerators

Cardarilli, Gian Carlo; Di Nunzio, Luca; Fazzolari, Rocco; Giardino, Daniele; Matta, Marco; Re, Marco; Spanò, Sergio

doi:10.1007/978-3-030-66729-0_32

Gian Carlo Cardarilli³⁶,
Luca Di Nunzio³⁶,
Rocco Fazzolari³⁶,
Daniele Giardino³⁶,
Marco Matta³⁶,
Marco Re³⁶ &
…
Sergio Spanò³⁶

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 738))

Included in the following conference series:

International Conference on Applications in Electronics Pervading Industry, Environment and Society

655 Accesses
6 Citations

Abstract

We propose the first hardware architecture for an action-selection Policy Generator feasible for Reinforcement Learning hardware accelerators. The system is capable of producing outputs for random, greedy and ε-greedy action-selection policies within the same circuit. It requires a very moderate number of hardware resources, shows a limited power dissipation, and can be integrated in the state of the art of Reinforcement Learning hardware accelerators due to its high computational speed. Our architecture is meant to work with Q-Matrix based Reinforcement Learning algorithms such as Q-Learning and SARSA.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Giuliano, R., et al.: Indoor localization system based on bluetooth low energy for museum applications. Electronics (Switzerland) 9(6), 1–20 (2020). art. no. 1055
Google Scholar
Capizzi, G., et al.: Small lung nodules detection based on fuzzy-logic and probabilistic neural network with bioinspired reinforcement learning. IEEE Trans. Fuzzy Syst. 28(6), 1178–1189 (2020). art. no. 8895990
Google Scholar
Napoli, C., Bonanno, F., Capizzi, G.: An hybrid neuro-wavelet approach for long-term prediction of solar wind. Proc. Int. Astron. Union 6(S274), 153–155 (2010)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT press, Cambridge (2018)
Google Scholar
Lin, J.L., et al.: Gait balance and acceleration of a biped robot based on Q-learning. IEEE Access 4, 2439–2449 (2016)
Google Scholar
Matta, M., et al.: A reinforcement learning-based QAM/PSK symbol synchronizer. IEEE Access 7, 124147–124157 (2019)
Google Scholar
Zhu, J., et al.: A new deep-Q-learning-based transmission scheduling mechanism for the cognitive Internet of Things. IEEE Internet Things J. 5(4), 2375–2385 (2017)
Google Scholar
Samadi, E., Badri, A., Ebrahimpour, R.: Decentralized multi-agent based energy management of microgrid using reinforcement learning. Int. J. Electr. Power Energy Syst. 122, 106211 (2020)
Google Scholar
Matta, M., et al.: Q-RTS: a real-time swarm intelligence based on multi-agent Q-learning. Electron. Lett. 55(10), 589–591 (2019)
Google Scholar
Watkins, C.J.C.H., Dayan, P.: Q-learning. Mach. Learn. 8(3–4), 279-292 (1992)
Google Scholar
Rummery, G.A., Niranjan, M.: On-line Q-learning using connectionist systems, vol. 37. University of Cambridge, Department of Engineering, Cambridge (1994)
Google Scholar
Da Silva, L.M., Torquato, M.F., Fernandes, M.A.: Parallel implementation of reinforcement learning Q-learning technique for FPGA. IEEE Access 7, 2782–2798 (2018)
Google Scholar
Rajat, R., et al.: Qtaccel: a generic fpga based design for q-table based reinforcement learning accelerators. In: The 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (2020)
Google Scholar
Spanò, S., et al.: An efficient hardware implementation of reinforcement learning: the Q-learning algorithm. IEEE Access 7, 186340–186351(2019)
Google Scholar
Tijsma, A.D., Drugan, M.M., Wiering, M.A.: Comparing exploration strategies for Q-learning in random stochastic mazes. In: 2016 IEEE Symposium Series on Computational Intelligence (SSCI). IEEE (2016)
Google Scholar

Download references

Acknowledgments

The authors would like to thank Xilinx Inc. for providing FPGA hardware and software tools by Xilinx University Program.

Author information

Authors and Affiliations

Department of Electronic Engineering, University of Rome Tor Vergata, Via del Politecnico 1, 00133, Rome, Italy
Gian Carlo Cardarilli, Luca Di Nunzio, Rocco Fazzolari, Daniele Giardino, Marco Matta, Marco Re & Sergio Spanò

Authors

Gian Carlo Cardarilli
View author publications
You can also search for this author in PubMed Google Scholar
Luca Di Nunzio
View author publications
You can also search for this author in PubMed Google Scholar
Rocco Fazzolari
View author publications
You can also search for this author in PubMed Google Scholar
Daniele Giardino
View author publications
You can also search for this author in PubMed Google Scholar
Marco Matta
View author publications
You can also search for this author in PubMed Google Scholar
Marco Re
View author publications
You can also search for this author in PubMed Google Scholar
Sergio Spanò
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sergio Spanò .

Editor information

Editors and Affiliations

DII, University of Pisa, Pisa, Italy
Sergio Saponara
DITEN, University of Genoa, Genoa, Italy
Alessandro De Gloria

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cardarilli, G.C. et al. (2021). An Action-Selection Policy Generator for Reinforcement Learning Hardware Accelerators. In: Saponara, S., De Gloria, A. (eds) Applications in Electronics Pervading Industry, Environment and Society. ApplePies 2020. Lecture Notes in Electrical Engineering, vol 738. Springer, Cham. https://doi.org/10.1007/978-3-030-66729-0_32

Download citation

DOI: https://doi.org/10.1007/978-3-030-66729-0_32
Published: 26 January 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-66728-3
Online ISBN: 978-3-030-66729-0
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics