Abstract
In this paper we present a new approach to tackle complex routing problems with an improved state representation that utilizes the model complexity better than previous methods. We enable this by training from temporal differences. Specifically Q-Learning is employed. We show that our approach achieves state-of-the-art performance for autoregressive policies that sequentially insert nodes to construct solutions on the Capacitated Vehicle Routing Problem (CVRP). Additionally, we are the first to tackle the Multiple Depot Vehicle Routing Problem (MDVRP) with Reinforcement Learning (RL) and demonstrate that this problem type greatly benefits from our approach over other Machine Learning (ML) methods.
Keywords
A. Bdeir, S. Boeder, T. Dernedde and K. Tkachuk—Equal contribution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bello, I., Pham, H., Le, Q.V., Norouzi, M., Bengio, S.: Neural combinatorial optimization with reinforcement learning. CoRR (2016). http://arxiv.org/abs/1611.09940
Chen, X., Tian, Y.: Learning to perform local rewriting for combinatorial optimization. In: Advances in Neural Information Processing Systems, vol. 32. Curran Associates, Inc. (2019)
Dai, H., Dai, B., Song, L.: Discriminative embeddings of latent variable models for structured data. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning, ICML 2016, vol. 48, pp. 2702–2711 (2016)
Delarue, A., Anderson, R., Tjandraatmadja, C.: Reinforcement learning with combinatorial actions: an application to vehicle routing. In: Advances in Neural Information Processing Systems, vol. 33, pp. 609–620. Curran Associates, Inc. (2020)
Falkner, J.K., Schmidt-Thieme, L.: Learning to solve vehicle routing problems with time windows through joint attention (2020). http://arxiv.org/abs/2006.09100
Gurobi Optimization, LLC: Gurobi optimizer reference manual (2021)
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: ICML (2018)
van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double q-learning. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, AAAI 2016, pp. 2094–2100. AAAI Press (2016)
Helsgaun, K.: An extension of the Lin-Kernighan-Helsgaun TSP solver for constrained traveling salesman and vehicle routing problems (2017). https://doi.org/10.13140/RG.2.2.25569.40807
Hessel, M., et al.: Rainbow: combining improvements in deep reinforcement learning. In: AAAI (2018)
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd International Conference on Machine Learning, Proceedings of Machine Learning Research, Lille, France, vol. 37, pp. 448–456. PMLR (2015)
Joshi, C.K., Laurent, T., Bresson, X.: An efficient graph convolutional network technique for the travelling salesman problem. CoRR (2019). http://arxiv.org/abs/1906.01227
Khalil, E., Dai, H., Zhang, Y., Dilkina, B., Song, L.: Learning combinatorial optimization algorithms over graphs. In: Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc. (2017)
Kool, W., van Hoof, H., Gromicho, J., Welling, M.: Deep policy dynamic programming for vehicle routing problems (2021). http://arxiv.org/abs/2102.11756
Kool, W., van Hoof, H., Welling, M.: Attention, learn to solve routing problems! In: International Conference on Learning Representations (2019)
Lin, S., Kernighan, B.W.: An effective heuristic algorithm for the Traveling-Salesman problem. Oper. Res. 21(2), 498–516 (1973). https://doi.org/10.1287/opre.21.2.498
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015). https://doi.org/10.1038/nature14236. ISSN 00280836
Nazari, M.R., Oroojlooy, A., Snyder, L., Takac, M.: Reinforcement learning for solving the vehicle routing problem. In: Advances in Neural Information Processing Systems, vol. 31. Curran Associates, Inc. (2018)
Perron, L., Furnon, V.: OR-Tools 7.2 (2019)
Rosenkrantz, D.J., Stearns, R.E., Lewis, P.M.: An analysis of several heuristics for the Traveling Salesman problem. In: Ravi, S.S., Shukla, S.K. (eds.) Fundamental Problems in Computing, pp. 45–69. Springer, Dordrecht (2009). https://doi.org/10.1007/978-1-4020-9688-4_3
Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay (2015). http://arxiv.org/abs/1511.05952
Shaw, P.: Using constraint programming and local search methods to solve vehicle routing problems. In: Maher, M., Puget, J.-F. (eds.) CP 1998. LNCS, vol. 1520, pp. 417–431. Springer, Heidelberg (1998). https://doi.org/10.1007/3-540-49481-2_30
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn. The MIT Press (2018)
Toth, P., Vigo, D.: Vehicle Routing: Problems, Methods, and Applications, 2nd edn. No. 18 in MOS-SIAM Series on Optimization, SIAM (2014). ISBN 9781611973587
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30, Curran Associates, Inc. (2017)
Vinyals, O., Fortunato, M., Jaitly, N.: Pointer networks. In: Advances in Neural Information Processing Systems, vol. 28, Curran Associates, Inc. (2015)
Voudouris, C., Tsang, E.: Guided local search and its application to the Traveling Salesman problem. Eur. J. Oper. Res. 113(2), 469–499 (1999). https://doi.org/10.1016/S0377-2217(98)00099-X
Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8(3–4), 229–256 (1992)
Wu, Y., Song, W., Cao, Z., Zhang, J., Lim, A.: Learning improvement heuristics for solving routing problems (2020). http://arxiv.org/abs/1912.05784
Zhao, J., Mao, M., Zhao, X., Zou, J.: A hybrid of deep reinforcement learning and local search for the vehicle routing problems. IEEE Trans. Intell. Trans. Syst. 1–11 (2020). https://doi.org/10.1109/TITS.2020.3003163
Acknowledgement
This work is co-funded via the research project L2O (https://www.ismll.uni-hildesheim.de/projekte/l2o_en.html) funded by the German Federal Ministry of Education and Research (BMBF) under the grant agreement no. 01IS20013A and the European Regional Development Fund project TrAmP (https://www.ismll.uni-hildesheim.de/projekte/tramp.html) under the grant agreement no. 85023841.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Bdeir, A., Boeder, S., Dernedde, T., Tkachuk, K., Falkner, J.K., Schmidt-Thieme, L. (2021). RP-DQN: An Application of Q-Learning to Vehicle Routing Problems. In: Edelkamp, S., Möller, R., Rueckert, E. (eds) KI 2021: Advances in Artificial Intelligence. KI 2021. Lecture Notes in Computer Science(), vol 12873. Springer, Cham. https://doi.org/10.1007/978-3-030-87626-5_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-87626-5_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-87625-8
Online ISBN: 978-3-030-87626-5
eBook Packages: Computer ScienceComputer Science (R0)