Abstract
The vehicle routing problem with pickup and delivery is one of the most important problems in the context of global urban population growth. Although these kinds of small-size problems can be solved using various classical approaches, a fast (or real-time) route optimizer under real-world constraints (such as throughput and time window constraints) for medium- and large-size problems is still a challenge. In this work, we first successfully applied a deep reinforcement learning approach (a modified JAMPR model) to solve the capacitated pickup and delivery problem with time windows (CPDPTW). We obtained a robust model that gives a fast optimal solution for small- to medium-size problems and gives a fast suboptimal solution for large-size (>200) problems.
REFERENCES
B. Balaji, J. Bell-Masterson, E. Bilgin, A. Damianou, P. M. Garcia, A. Jain, R. Luo, A. Maggiar, B. Narayanaswamy, and Ch. Ye, “ORL: Reinforcement learning benchmarks for online stochastic optimization problems,” (2019). arXiv:1911.10641 [cs.LG]
K. Braekers, K. Ramaekers, I. Van Nieuwenhuyse, “The vehicle routing problem: State of the art classification and review,” Comput. Ind. Eng. 99, 300–313 (2016). https://doi.org/10.1016/j.cie.2015.12.007
O. Bräysy and M. Gendreau, “Vehicle routing problem with time windows, Part I: Route construction and local search algorithms,” Transp. Sci. 39, 104–118 (2005). https://doi.org/10.1287/trsc.1030.0056
X. Chen and Yu. Tian, “Learning to perform local rewriting for combinatorial optimization,” Adv. Neural Inf. Process. Syst. 32 (2019).
G. Clarke and J. W. Wright, “Scheduling of vehicles from a central depot to a number of delivery points,” Oper. Res. 12, 568–581 (1964). https://doi.org/10.1287/opre.12.4.568
G. Dantzig, R. Fulkerson, and S. Johnson, “Solution of a large-scale traveling-salesman problem,” J. Oper. Res. Soc. Am. 2, 393–410 (1954). https://doi.org/10.1007/978-3-540-68279-0_1
G. B. Dantzig and J. H. Ramser, “The truck dispatching problem,” Manage. Sci. 6, 80–91 (1959). https://doi.org/10.1287/mnsc.6.1.80
J. K. Falkner and L. Schmidt-Thieme, “Learning to solve vehicle routing problems with time windows through joint attention,” (2020). arXiv:2006.09100 [cs.LG]
W. Kool, H. Van Hoof, and M. Welling, “Attention, learn to solve routing problems!,” (2018). arXiv:1803.08475 [stat.ML]
S. Li, Zh. Yan, and C. Wu, “Learning to delegate for large-scale vehicle routing,” Adv. Neural Inf. Process. Syst. 34 (2021). https://doi.org/10.48550/arXiv.2107.04139
S. Lin and B. W. Kernighan, “An effective heuristic algorithm for the traveling-salesman problem,” Oper. Res. 21, 498–516 (1973). https://doi.org/10.1287/opre.21.2.498
J. D. Little, K. G. Murty, D. W. Sweeney, and C. Karel, “An algorithm for the traveling salesman problem,” Oper. Res. 11, 972–989 (1963). https://doi.org/10.1287/opre.11.6.972
H. Lu, X. Zhang, and Sh. Yang, “A learning-based iterative method for solving vehicle routing problems,” in Int. Conf. on Learning Representations (2019).
M. Nazari, A. Oroojlooy, L. Snyder, and M. Takác, “Reinforcement learning for solving the vehicle routing problem,” Adv. Neural Inf. Process. Syst. 31 (2018). https://doi.org/10.48550/arXiv.1802.04240
I. Or, “Traveling salesman type combinatorial problems and their relation to the logistics of regional blood banking,” PhD Thesis (Northwestern Univ., 1976)
S. N. Parragh, K. F. Doerner, and R. F. Hartl, “A survey on pickup and delivery problems,” J. Betriebswirtschaft 58 (1), 21–51 (2008). https://doi.org/10.1007/s11301-008-0033-7
L. Perron, “Operations research and constraint programming at Google,” in Principles and Practice of Constraint Programming—CP 2011, Lecture Notes in Computer Science, Vol. 6876 (Springer, Berlin, 2011), p. 2. https://doi.org/10.1007/978-3-642-23786-7_2
Zh. T. Qin, H. Zhu, and J. Ye, “Reinforcement learning for ridesharing: An extended survey,” Transp. Res. Part C: Emerging Technol. 144, 103852 (2022). https://doi.org/10.1016/j.trc.2022.103852
M. W. Savelsbergh, “The vehicle routing problem with time windows: Minimizing route duration,” ORSA J. Comput. 4, 146–154 (1992). https://doi.org/10.1287/ijoc.4.2.146
M. M. Solomon, “Algorithms for the vehicle routing and scheduling problems with time window constraints,” Oper. Res. 35, 254–265 (1987). https://doi.org/10.1287/opre.35.2.254
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” Adv. Neural Inf. Process. Syst. 30 (2017). https://doi.org/10.48550/arXiv.1706.03762
T. Vidal, “Hybrid genetic search for the CVRP: Open-source implementation and swap* neighborhood,” Comput. Oper. Res. 140, 105643 (2022). https://doi.org/10.48550/arXiv.2012.10384
T. Vidal, T. G. Crainic, M. Gendreau, N. Lahrichi, and W. Rei, “A hybrid genetic algorithm for multidepot and periodic vehicle routing problems,” Oper. Res. 60, 611–624 (2012). https://doi.org/10.1287/opre.1120.1048
O. Vinyals, M. Fortunato, and N. Jaitly, “Pointer networks,” Adv. Neural Inf. Process. Syst. 28 (2015). https://doi.org/10.48550/arXiv.1506.03134
G. Nemhauser and L. Wolsey, Integer and Combinatorial Optimization (John Wiley and Sons, 1999). https://doi.org/10.1002/9781118627372
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
The authors declare that they have no conflicts of interest.
Additional information
Soroka, Andrew Gennad’evich. Postgraduate student of the Faculty of Computational Mathematics and Cybernetics of Moscow State University. Graduated with bachelor’s degree with honors in 2018, graduated with master’s degree with honors in 2020. Scientific interests: reinforcement learning in combinatorial optimization problems, face biometrics, representation learning, deep learning in astrophysics problems with limited markup.
Meshcheryakov, Alex Valer’evich. Candidate of Physical and Mathematical Sciences, mathematician of the Intelligent Information Technologies Department of the Faculty of Computational Mathematics and Cybernetics of Moscow State University, senior researcher at the Space Research Institute of the Russian Academy of Sciences. Graduated from the Physics Faculty of Moscow State University in 2002. In 2011, received the degree of candidate of physical and mathematical sciences. In 2006–2007, received a 2-year training at the Center for Astrophysics at Harvard University. Web of Science ResearcherID U-4496-2017. Since 2014, has been engaged in research in the field of application of methods of machine learning and deep learning in the problems of observational astrophysics, a member of the Russian consortium of the SRG/EROSITA space mission. Research interests: astroinformatics, machine learning and deep learning in problems with bounded markup, and reinforcement learning for combinatorial optimization problems.
Gerasimov, Sergey Valer’evich. An employee of the Intelligent Information Technologies Department of the Faculty of Computational Mathematics and Cybernetics of Moscow State University, the author of a course of lectures Systems of Distributed Storage and Big Data Processing in the master’s program Intellectual Analysis of Big Data at the Faculty of Computational Mathematics and Cybernetics of Moscow State University. Research interests: machine learning and optimization methods in finance, DevOps, MLOps, and ModelOps technologies.
Translated by O. Pismenov
Rights and permissions
About this article
Cite this article
Soroka, A.G., Meshcheryakov, A.V. & Gerasimov, S.V. Deep Reinforcement Learning for the Capacitated Pickup and Delivery Problem with Time Windows. Pattern Recognit. Image Anal. 33, 169–178 (2023). https://doi.org/10.1134/S1054661823020165
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S1054661823020165