Skip to main content
Log in

Online food ordering delivery strategies based on deep reinforcement learning

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

With the rapid development of Online to Offline (O2O) business, millions of transactions are performed on the popular online food ordering platforms each day. Efficient dispatching of orders and dynamic adjustment of delivery routes are critical to the success of the O2O platforms. However, the vast volume of transactions and the computational complexity of delivery routes pose significant challenges to the efficient dispatching of orders. The action to dispatch orders and the resulting state transition of couriers form a Markov decision process (MDP). The reinforcement learning technique had proven its capability of dealing with MDP. This paper proposes a Double Deep Q Netwok (DQN) based reinforcement learning framework that gradually tests and learns the order dispatching policy by communicating with an O2O simulation model developed by SUMO. The preliminary experimental results using the real order data demonstrate the effectiveness and efficiency of the proposed Double-DQN based order dispatcher. Also, different state encoding schemes are designed and tested to improve the performance of the Double-DQN based dispatcher.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

References

  1. Altabeeb AM, Mohsen AM, Ghallab A (2019) An improved hybrid firefly algorithm for capacitated vehicle routing problem. Appl Soft Comput 84:105728

    Article  Google Scholar 

  2. Behrisch M, Bieker L, Erdmann J, Krajzewicz D (2011) Sumo - simulation of urban mobility: An overview. In: The third international conference on advances in system simulation

  3. Chen S-A, Tangkaratt V, Lin H-T, Sugiyama M (2020) Active deep q-learning with demonstration. Mach Learn 109:1699–1725

    Article  MathSciNet  Google Scholar 

  4. Mogale DG, Mukesh KS, Krishna K, Manoj KT (2019) Grain silo location-allocation problem with dwell time for optimization of food grain supply chain network. Transp Res Part E Logist Transp Rev 111:40–69

    Article  Google Scholar 

  5. Elmachtoub AN, Grigas P (2021) Smart “predict, then optimize”. Manag Sci

  6. Elshaer R, Awad H (2020) A taxonomic review of metaheuristic algorithms for solving the vehicle routing problem and its variants. Comput Ind Eng 140:106242

    Article  Google Scholar 

  7. Ferreira KJ, Lee BHA, Simchi-Levi D (2016) Analytics for an online retailer: Demand forecasting and price optimization. Manuf Serv Oper Manag 18(1):69–88

    Article  Google Scholar 

  8. Goel R, Maini R (2018) A hybrid of ant colony and firefly algorithms (hafa) for solving vehicle routing problems. J Comput Sci 25:28–37

    Article  MathSciNet  Google Scholar 

  9. Hado H (2010) Double q-learning. Adv Neural Inf Process Syst 23:2613–2621

    Google Scholar 

  10. Klapp MA, Erera AL, Toriello A (2018) The one-dimensional dynamic dispatch waves problem. Transp Sci 52(2):402–415

    Article  Google Scholar 

  11. Li C, Li Y, Zhao Y, Peng P, Sler XG (2021) Self-generated long-term experience replay for continual reinforcement learning. Appl Intell 51(1):185–201

    Article  Google Scholar 

  12. Li H, Li Z, Li C, Wang R, Mu R (2020) Research on optimization of electric vehicle routing problem with time window. IEEE Access 8:146707–146718

    Article  Google Scholar 

  13. Li M, Qin Z, Jiao Y, Yang Y, Wang J, Wang C, Guobin W, Ye J (2019) Efficient ridesharing order dispatching with mean field multi-agent reinforcement learning. In: The world wide web conference, pp 983–994

  14. Liu S, He L, Max Shen Z-J (2020) On-time last-mile delivery: Order assignment with travel-time predictors. Manag Sci

  15. Lopez PA, Behrisch M, Bieker-Walz L, Erdmann J, Flötteröd Y-P, Hilbrich R, Lücken L., Rummel J, Wagner P, Wießner E. (2018) Microscopic traffic simulation using sumo. In: The 21st IEEE international conference on intelligent transportation systems. IEEE

  16. Mao C, Liu Y, Shen Z-JM (2020) Dispatch of autonomous vehicles for taxi services: A deep reinforcement learning approach. Trans Res Part C Emerg Technol 115:102626

    Article  Google Scholar 

  17. Marinakis Y, Marinaki M, Migdalas A (2019) A multi-adaptive particle swarm optimization for the vehicle routing problem with time windows. Inform Sci 481:311–329

    Article  Google Scholar 

  18. Mehrjerdi YZ, Shafiee M (2021) A resilient and sustainable closed-loop supply chain using multiple sourcing and information sharing strategies. J Clean Prod 289:125141

    Article  Google Scholar 

  19. meituan.com (2019) The research report for the industry of food delivery service in China 2019. https://mri.meituan.com/institute

  20. Pan J, Wang X, Cheng Y, Qiang Y (2018) Multisource transfer double dqn based on actor learning. IEEE Trans Neural Netw Learn Syst 29(6):2227–2238

    Article  Google Scholar 

  21. Plinere D, Aleksejeva L (2019) Production scheduling in agent-based supply chain for manufacturing efficiency improvement. Procedia Comput Sci 149:36–43

    Article  Google Scholar 

  22. Qiu M, Zhuo F, Eglese R, Tang Q (2018) A tabu search algorithm for the vehicle routing problem with discrete split deliveries and pickups. Comput Oper Res 100:102–116

    Article  MathSciNet  Google Scholar 

  23. Ruiz E, Soto-Mendoza V, Barbosa AER, Reyes R (2019) Solving the open vehicle routing problem with capacity and distance constraints with a biased random key genetic algorithm. Comput Ind Eng 133:207–219

    Article  Google Scholar 

  24. Saeedi S (2018) Integrating macro and micro scale approaches in the agent-based modeling of residential dynamics. Int J Appl Earth Obs Geoinf 68:214–229

    Article  Google Scholar 

  25. Ricardo S., Marques A, Amorim P, Rasinmäki J. (2019) Multiple vehicle synchronisation in a full truck-load pickup and delivery problem: A case-study in the biomass supply chain. Eur J Oper Res 277 (1):174–194

    Article  MathSciNet  Google Scholar 

  26. Swaminathan JM, Smith SF, Sadeh NM (2007) Modeling supply chain dynamics: A multiagent approach. Decis Sci 29(3):607–632

    Article  Google Scholar 

  27. Hasselt HV, Guez A, Silver D (2016) Deep reinforcement learning with double q-learning. In: Proceedings of the AAAI conference on artificial intelligence, vol 30

  28. Watkins CJCH, Dayan P (1992) Q-learning. Mach Learn 8(3-4):279–292

    Article  Google Scholar 

  29. Zhao X, Ding S, An Y, Jia W (2019) Applications of asynchronous deep reinforcement learning based on dynamic updating weights. Appl Intell 49(2):581–591

    Article  Google Scholar 

Download references

Acknowledgments

This research is supported by National Natural Science Foundation of China (Grant No. 71771035, 71831003).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guangyu Zou.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zou, G., Tang, J., Yilmaz, L. et al. Online food ordering delivery strategies based on deep reinforcement learning. Appl Intell 52, 6853–6865 (2022). https://doi.org/10.1007/s10489-021-02750-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02750-3

Keywords

Navigation