Abstract
Automated warehouses are widely deployed in large-scale distribution centers due to their ability of reducing operational cost and improving throughput capacity. In an automated warehouse, orders are fulfilled by battery-powered AGVs transporting movable shelves or boxes. Therefore, battery management is crucial to the productivity since recovering depleted batteries can be time-consuming and seriously affect the overall performance of the system by reducing the number of available robots. In this paper, we propose to solve the battery management problem by using deep reinforcement learning (DRL). We first formulate the battery management problem as a Markov Decision Process (MDP). Then we show the state-of-the-art DRL method which uses Gaussian noise to enforce exploration could perform poorly in the formulated MDP, and present a novel algorithm called TD3-ARL that performs effective exploration by regulating the magnitude of the outputted action. Finally, extensive empirical evaluations confirm the superiority of our algorithm over the state-of-the-art and the rule-based policies.
Keywords
- Automated warehouses
- Battery management
- Deep reinforcement learning
This is a preview of subscription content, access via your institution.
Buying options





References
Chou, P.W., Maturana, D., Scherer, S.: Improving stochastic policy gradients in continuous control with deep reinforcement learning using the beta distribution. In: ICML, pp. 834–843 (2017)
Coulom, R.: Efficient selectivity and backup operators in Monte-Carlo tree search. In: van den Herik, H.J., Ciancarini, P., Donkers, H.H.L.M.J. (eds.) CG 2006. LNCS, vol. 4630, pp. 72–83. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-75538-8_7
Ebben, M.: Logistic control in automated transportation networks. Ph.D. thesis, University of Twente (2001)
Enright, J.J., Wurman, P.R.: Optimization and coordinated autonomy in mobile fulfillment systems. In: Workshops at the Twenty-Fifth AAAI Conference on Artificial Intelligence (2011)
François-Lavet, V., Henderson, P., Islam, R., Bellemare, M.G., Pineau, J.: An introduction to deep reinforcement learning. Found. Trends Mach. Learn. 11(3–4), 219–354 (2018)
Fujimoto, S., Hoof, H., Meger, D.: Addressing function approximation error in actor-critic methods. In: ICML, pp. 1582–1591 (2018)
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: ICML, pp. 1856–1865 (2018)
Kocsis, L., Szepesvári, C.: Bandit based Monte-Carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006). https://doi.org/10.1007/11871842_29
Le-Anh, T., De Koster, M.: A review of design and control of automated guided vehicle systems. Eur. J. Oper. Res. 171(1), 1–23 (2006)
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. In: ICLR (2016)
McHANEY, R.: Modelling battery constraints in discrete event automated guided vehicle simulations. Int. J. Prod. Res. 33(11), 3023–3040 (1995)
Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: ICML, pp. 1928–1937 (2016)
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)
OpenAI: Openai five (2018). https://blog.openai.com/openai-five/
Rummery, G.A., Niranjan, M.: On-line Q-learning using connectionist systems. Tech. rep., Cambridge University (1994)
Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484 (2016)
Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., Riedmiller, M.A.: Deterministic policy gradient algorithms. In: ICML, pp. 387–395 (2014)
Silver, D., et al.: Mastering the game of go without human knowledge. Nature 550(7676), 354 (2017)
Sutton, R.S.: Learning to predict by the methods of temporal differences. Mach. Learn. 3, 9–44 (1988)
Watkins, C.J.C.H.: Learning from delayed rewards. Ph.D. thesis, King’s College, Cambridge, UK (May 1989)
Zhao, M., Li, Z., An, B., Lu, H., Yang, Y., Chu, C.: Impression allocation for combating fraud in e-commerce via deep reinforcement learning with action norm penalty. In: IJCAI, pp. 3940–3946 (2018)
Zou, B., Xu, X., De Koster, R., et al.: Evaluating battery charging and swapping strategies in a robotic mobile fulfillment system. Eur. J. Oper. Res. 267(2), 733–753 (2018)
Acknowledgements
This work was supported by Alibaba Group through Alibaba Innovative Research (AIR) Program and Alibaba-NTU Joint Research Institute (JRI), Nanyang Technological University, Singapore.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Deng, Y., An, B., Qiu, Z., Li, L., Wang, Y., Xu, Y. (2020). Battery Management for Automated Warehouses via Deep Reinforcement Learning. In: Taylor, M.E., Yu, Y., Elkind, E., Gao, Y. (eds) Distributed Artificial Intelligence. DAI 2020. Lecture Notes in Computer Science(), vol 12547. Springer, Cham. https://doi.org/10.1007/978-3-030-64096-5_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-64096-5_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-64095-8
Online ISBN: 978-3-030-64096-5
eBook Packages: Computer ScienceComputer Science (R0)