Skip to main content
Log in

Deep Reinforcement Learning for the Capacitated Pickup and Delivery Problem with Time Windows

  • SELECTED CONFERENCE PAPERS
  • Published:
Pattern Recognition and Image Analysis Aims and scope Submit manuscript

Abstract

The vehicle routing problem with pickup and delivery is one of the most important problems in the context of global urban population growth. Although these kinds of small-size problems can be solved using various classical approaches, a fast (or real-time) route optimizer under real-world constraints (such as throughput and time window constraints) for medium- and large-size problems is still a challenge. In this work, we first successfully applied a deep reinforcement learning approach (a modified JAMPR model) to solve the capacitated pickup and delivery problem with time windows (CPDPTW). We obtained a robust model that gives a fast optimal solution for small- to medium-size problems and gives a fast suboptimal solution for large-size (>200) problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.

REFERENCES

  1. B. Balaji, J. Bell-Masterson, E. Bilgin, A. Damianou, P. M. Garcia, A. Jain, R. Luo, A. Maggiar, B. Narayanaswamy, and Ch. Ye, “ORL: Reinforcement learning benchmarks for online stochastic optimization problems,” (2019). arXiv:1911.10641 [cs.LG]

  2. K. Braekers, K. Ramaekers, I. Van Nieuwenhuyse, “The vehicle routing problem: State of the art classification and review,” Comput. Ind. Eng. 99, 300–313 (2016). https://doi.org/10.1016/j.cie.2015.12.007

    Article  Google Scholar 

  3. O. Bräysy and M. Gendreau, “Vehicle routing problem with time windows, Part I: Route construction and local search algorithms,” Transp. Sci. 39, 104–118 (2005). https://doi.org/10.1287/trsc.1030.0056

    Article  Google Scholar 

  4. X. Chen and Yu. Tian, “Learning to perform local rewriting for combinatorial optimization,” Adv. Neural Inf. Process. Syst. 32 (2019).

  5. G. Clarke and J. W. Wright, “Scheduling of vehicles from a central depot to a number of delivery points,” Oper. Res. 12, 568–581 (1964). https://doi.org/10.1287/opre.12.4.568

    Article  Google Scholar 

  6. G. Dantzig, R. Fulkerson, and S. Johnson, “Solution of a large-scale traveling-salesman problem,” J. Oper. Res. Soc. Am. 2, 393–410 (1954). https://doi.org/10.1007/978-3-540-68279-0_1

    Article  MathSciNet  MATH  Google Scholar 

  7. G. B. Dantzig and J. H. Ramser, “The truck dispatching problem,” Manage. Sci. 6, 80–91 (1959). https://doi.org/10.1287/mnsc.6.1.80

    Article  MathSciNet  MATH  Google Scholar 

  8. J. K. Falkner and L. Schmidt-Thieme, “Learning to solve vehicle routing problems with time windows through joint attention,” (2020). arXiv:2006.09100 [cs.LG]

  9. W. Kool, H. Van Hoof, and M. Welling, “Attention, learn to solve routing problems!,” (2018). arXiv:1803.08475 [stat.ML]

  10. S. Li, Zh. Yan, and C. Wu, “Learning to delegate for large-scale vehicle routing,” Adv. Neural Inf. Process. Syst. 34 (2021). https://doi.org/10.48550/arXiv.2107.04139

  11. S. Lin and B. W. Kernighan, “An effective heuristic algorithm for the traveling-salesman problem,” Oper. Res. 21, 498–516 (1973). https://doi.org/10.1287/opre.21.2.498

    Article  MathSciNet  MATH  Google Scholar 

  12. J. D. Little, K. G. Murty, D. W. Sweeney, and C. Karel, “An algorithm for the traveling salesman problem,” Oper. Res. 11, 972–989 (1963). https://doi.org/10.1287/opre.11.6.972

    Article  MATH  Google Scholar 

  13. H. Lu, X. Zhang, and Sh. Yang, “A learning-based iterative method for solving vehicle routing problems,” in Int. Conf. on Learning Representations (2019).

  14. M. Nazari, A. Oroojlooy, L. Snyder, and M. Takác, “Reinforcement learning for solving the vehicle routing problem,” Adv. Neural Inf. Process. Syst. 31 (2018). https://doi.org/10.48550/arXiv.1802.04240

  15. I. Or, “Traveling salesman type combinatorial problems and their relation to the logistics of regional blood banking,” PhD Thesis (Northwestern Univ., 1976)

  16. S. N. Parragh, K. F. Doerner, and R. F. Hartl, “A survey on pickup and delivery problems,” J. Betriebswirtschaft 58 (1), 21–51 (2008). https://doi.org/10.1007/s11301-008-0033-7

    Article  Google Scholar 

  17. L. Perron, “Operations research and constraint programming at Google,” in Principles and Practice of Constraint Programming—CP 2011, Lecture Notes in Computer Science, Vol. 6876 (Springer, Berlin, 2011), p. 2. https://doi.org/10.1007/978-3-642-23786-7_2

    Book  Google Scholar 

  18. Zh. T. Qin, H. Zhu, and J. Ye, “Reinforcement learning for ridesharing: An extended survey,” Transp. Res. Part C: Emerging Technol. 144, 103852 (2022). https://doi.org/10.1016/j.trc.2022.103852

    Article  Google Scholar 

  19. M. W. Savelsbergh, “The vehicle routing problem with time windows: Minimizing route duration,” ORSA J. Comput. 4, 146–154 (1992). https://doi.org/10.1287/ijoc.4.2.146

    Article  MATH  Google Scholar 

  20. M. M. Solomon, “Algorithms for the vehicle routing and scheduling problems with time window constraints,” Oper. Res. 35, 254–265 (1987). https://doi.org/10.1287/opre.35.2.254

    Article  MathSciNet  MATH  Google Scholar 

  21. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” Adv. Neural Inf. Process. Syst. 30 (2017). https://doi.org/10.48550/arXiv.1706.03762

  22. T. Vidal, “Hybrid genetic search for the CVRP: Open-source implementation and swap* neighborhood,” Comput. Oper. Res. 140, 105643 (2022). https://doi.org/10.48550/arXiv.2012.10384

    Article  MathSciNet  MATH  Google Scholar 

  23. T. Vidal, T. G. Crainic, M. Gendreau, N. Lahrichi, and W. Rei, “A hybrid genetic algorithm for multidepot and periodic vehicle routing problems,” Oper. Res. 60, 611–624 (2012). https://doi.org/10.1287/opre.1120.1048

    Article  MathSciNet  MATH  Google Scholar 

  24. O. Vinyals, M. Fortunato, and N. Jaitly, “Pointer networks,” Adv. Neural Inf. Process. Syst. 28 (2015). https://doi.org/10.48550/arXiv.1506.03134

  25. G. Nemhauser and L. Wolsey, Integer and Combinatorial Optimization (John Wiley and Sons, 1999). https://doi.org/10.1002/9781118627372

    Book  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to A. G. Soroka, A. V. Meshcheryakov or S. V. Gerasimov.

Ethics declarations

The authors declare that they have no conflicts of interest.

Additional information

Soroka, Andrew Gennad’evich. Postgraduate student of the Faculty of Computational Mathematics and Cybernetics of Moscow State University. Graduated with bachelor’s degree with honors in 2018, graduated with master’s degree with honors in 2020. Scientific interests: reinforcement learning in combinatorial optimization problems, face biometrics, representation learning, deep learning in astrophysics problems with limited markup.

Meshcheryakov, Alex Valer’evich. Candidate of Physical and Mathematical Sciences, mathematician of the Intelligent Information Technologies Department of the Faculty of Computational Mathematics and Cybernetics of Moscow State University, senior researcher at the Space Research Institute of the Russian Academy of Sciences. Graduated from the Physics Faculty of Moscow State University in 2002. In 2011, received the degree of candidate of physical and mathematical sciences. In 2006–2007, received a 2-year training at the Center for Astrophysics at Harvard University. Web of Science ResearcherID U-4496-2017. Since 2014, has been engaged in research in the field of application of methods of machine learning and deep learning in the problems of observational astrophysics, a member of the Russian consortium of the SRG/EROSITA space mission. Research interests: astroinformatics, machine learning and deep learning in problems with bounded markup, and reinforcement learning for combinatorial optimization problems.

Gerasimov, Sergey Valer’evich. An employee of the Intelligent Information Technologies Department of the Faculty of Computational Mathematics and Cybernetics of Moscow State University, the author of a course of lectures Systems of Distributed Storage and Big Data Processing in the master’s program Intellectual Analysis of Big Data at the Faculty of Computational Mathematics and Cybernetics of Moscow State University. Research interests: machine learning and optimization methods in finance, DevOps, MLOps, and ModelOps technologies.

Translated by O. Pismenov

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Soroka, A.G., Meshcheryakov, A.V. & Gerasimov, S.V. Deep Reinforcement Learning for the Capacitated Pickup and Delivery Problem with Time Windows. Pattern Recognit. Image Anal. 33, 169–178 (2023). https://doi.org/10.1134/S1054661823020165

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S1054661823020165

Keywords:

Navigation