Abstract
Active deployment of Deep Reinforcement Learning (DRL) based controllers on safety-critical embedded platforms require model compaction. Neural pruning has been extensively studied in the context of CNNs and computer vision, but such approaches do not guarantee the preservation of safety in the context of DRL. A pruned network converging to high reward may not adhere to safety requirements. This paper proposes a framework, PruVer, that performs iterative refinement on a pruned network with verification in the loop. This results in a compressed network that adheres to safety specifications with formal guarantees over small time horizons. We demonstrate our method in model-free RL environments, achieving 40–60% compaction, significant latency benefits (3 to 10 times), and bounded guarantees for safety properties.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
References
Amir, G., Schapira, M., Katz, G.: Towards scalable verification of deep reinforcement learning. In: Formal Methods in Computer Aided Design, FMCAD 2021, New Haven, CT, USA, October 19–22, 2021, pp. 193–203. IEEE (2021)
Angeli, D., et al.: Monotone control systems. IEEE Trans. Autom. Control 48(10), 1684–1698 (2003). https://doi.org/10.1109/TAC.2003.817920
Arnob, S.Y., Ohib, R., Plis, S., Precup, D.: Single-shot pruning for offline reinforcement learning. arXiv preprint arXiv:2112.15579 (2021)
Blalock, D., et al.: What is the state of neural network pruning? Proc. Mach. Learn. Syst. 2, 129–146 (2020)
Brockman, G., et al.: Openai gym. arXiv preprint arXiv:1606.01540 (2016)
Coogan, S.: Mixed monotonicity for reachability and safety in dynamical systems. In: 2020 59th IEEE Conference on Decision and Control, pp. 5074–5085 (2020)
Deng, C., Sui, Y., Liao, S., Qian, X., Yuan, B.: GoSPA: an energy-efficient high-performance globally optimized SParse convolutional neural network accelerator. In: 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA), pp. 1110–1123 (2021). https://doi.org/10.1109/ISCA52012.2021.00090
Dutta, S., Jha, S., Sankaranarayanan, S., Tiwari, A.: Output range analysis for deep feedforward neural networks. In: Dutle, A., Muñoz, C., Narkawicz, A. (eds.) NFM 2018. LNCS, vol. 10811, pp. 121–138. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-77935-5_9
Evci, U., Pedregosa, F., Gomez, A., Elsen, E.: The difficulty of training sparse neural networks. arXiv preprint arXiv:1906.10732 (2019)
Frankle, J., Carbin, M.: The lottery ticket hypothesis: finding sparse, trainable neural networks. In: 7th International Conference on Learning Representations, ICLR New Orleans, LA, USA, May 6–9 (2019)
Gangopadhyay, B., et al.: Hierarchical program-triggered reinforcement learning agents for automated driving. IEEE Trans. Intell. Transp. Syst. 23(8), 10902–10911 (2022). https://doi.org/10.1109/TITS.2021.3096998
Gangopadhyay, B., et al.: Safety aware neural pruning for deep reinforcement learning (student abstract). Proc. AAAI 37(13), 16212–16213 (2023)
García, J., Fernández, F.: A comprehensive survey on safe reinforcement learning. J. Mach. Learn. Res. 16(42), 1437–1480 (2015)
Graesser, L., et al.: The state of sparse training in deep reinforcement learning. In: International Conference on Machine Learning, pp. 7766–7792. PMLR (2022)
Huang, X., Kwiatkowska, M., Wang, S., Wu, M.: Safety verification of deep neural networks. In: Majumdar, R., Kunčak, V. (eds.) CAV 2017. LNCS, vol. 10426, pp. 3–29. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-63387-9_1
Katz, G., Barrett, C., Dill, D.L., Julian, K., Kochenderfer, M.J.: Reluplex: an efficient SMT solver for verifying deep neural networks. In: Majumdar, R., Kunčak, V. (eds.) CAV 2017. LNCS, vol. 10426, pp. 97–117. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-63387-9_5
Katz, G., et al.: The marabou framework for verification and analysis of deep neural networks. In: Dillig, I., Tasiran, S. (eds.) CAV 2019. LNCS, vol. 11561, pp. 443–452. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-25540-4_26
Kusupati, A., et al.: Soft threshold weight reparameterization for learnable sparsity. In: III, H.D., Singh, A. (eds.) Proceedings of the 37th International Conference on Machine Learning, vol. 119, pp. 5544–5555. PMLR (2020)
Mocanu, D.C., et al.: Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science. Nat. Commun. 9(1), 1–12 (2018)
Moore, A.W.: Efficient memory-based learning for robot control. Tech. Rep. University of Cambridge (1990)
Rusu, A.A., et al.: Policy distillation. In: 4th International Conference on Learning Representations, ICLR (2016)
Silver, D., et al.: Mastering the game of go without human knowledge. Nature 550, 354–359 (2017). https://doi.org/10.1038/nature24270
Sokar, G., Mocanu, E., et al.: Dynamic sparse training for deep reinforcement learning. arXiv preprint arXiv:2106.04217 (2021)
Vischer, M.A., et al.: On lottery tickets and minimal task representations in deep reinforcement learning. arXiv preprint arXiv:2105.01648 (2021)
Zhang, H., et al.: Accelerating the deep reinforcement learning with neural network compression. In: International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Gangopadhyay, B., Dasgupta, P., Dey, S. (2024). PruVer: Verification Assisted Pruning for Deep Reinforcement Learning. In: Liu, F., Sadanandan, A.A., Pham, D.N., Mursanto, P., Lukose, D. (eds) PRICAI 2023: Trends in Artificial Intelligence. PRICAI 2023. Lecture Notes in Computer Science(), vol 14325. Springer, Singapore. https://doi.org/10.1007/978-981-99-7019-3_14
Download citation
DOI: https://doi.org/10.1007/978-981-99-7019-3_14
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-7018-6
Online ISBN: 978-981-99-7019-3
eBook Packages: Computer ScienceComputer Science (R0)