Online food ordering delivery strategies based on deep reinforcement learning

Zou, Guangyu; Tang, Jiafu; Yilmaz, Levent; Kong, Xiangyu

doi:10.1007/s10489-021-02750-3

Online food ordering delivery strategies based on deep reinforcement learning

Published: 17 September 2021

Volume 52, pages 6853–6865, (2022)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Guangyu Zou ORCID: orcid.org/0000-0003-1890-6874¹,
Jiafu Tang²,
Levent Yilmaz³ &
…
Xiangyu Kong¹

1863 Accesses
11 Citations
Explore all metrics

Abstract

With the rapid development of Online to Offline (O2O) business, millions of transactions are performed on the popular online food ordering platforms each day. Efficient dispatching of orders and dynamic adjustment of delivery routes are critical to the success of the O2O platforms. However, the vast volume of transactions and the computational complexity of delivery routes pose significant challenges to the efficient dispatching of orders. The action to dispatch orders and the resulting state transition of couriers form a Markov decision process (MDP). The reinforcement learning technique had proven its capability of dealing with MDP. This paper proposes a Double Deep Q Netwok (DQN) based reinforcement learning framework that gradually tests and learns the order dispatching policy by communicating with an O2O simulation model developed by SUMO. The preliminary experimental results using the real order data demonstrate the effectiveness and efficiency of the proposed Double-DQN based order dispatcher. Also, different state encoding schemes are designed and tested to improve the performance of the Double-DQN based dispatcher.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Order dispatching for an ultra-fast delivery service via deep reinforcement learning

Article 20 July 2021

Solving two-stage stochastic route-planning problem in milliseconds via end-to-end deep learning

Article Open access 14 February 2021

Reinforcement Learning Methods for Operations Research Applications: The Order Release Problem

References

Altabeeb AM, Mohsen AM, Ghallab A (2019) An improved hybrid firefly algorithm for capacitated vehicle routing problem. Appl Soft Comput 84:105728
Article Google Scholar
Behrisch M, Bieker L, Erdmann J, Krajzewicz D (2011) Sumo - simulation of urban mobility: An overview. In: The third international conference on advances in system simulation
Chen S-A, Tangkaratt V, Lin H-T, Sugiyama M (2020) Active deep q-learning with demonstration. Mach Learn 109:1699–1725
Article MathSciNet Google Scholar
Mogale DG, Mukesh KS, Krishna K, Manoj KT (2019) Grain silo location-allocation problem with dwell time for optimization of food grain supply chain network. Transp Res Part E Logist Transp Rev 111:40–69
Article Google Scholar
Elmachtoub AN, Grigas P (2021) Smart “predict, then optimize”. Manag Sci
Elshaer R, Awad H (2020) A taxonomic review of metaheuristic algorithms for solving the vehicle routing problem and its variants. Comput Ind Eng 140:106242
Article Google Scholar
Ferreira KJ, Lee BHA, Simchi-Levi D (2016) Analytics for an online retailer: Demand forecasting and price optimization. Manuf Serv Oper Manag 18(1):69–88
Article Google Scholar
Goel R, Maini R (2018) A hybrid of ant colony and firefly algorithms (hafa) for solving vehicle routing problems. J Comput Sci 25:28–37
Article MathSciNet Google Scholar
Hado H (2010) Double q-learning. Adv Neural Inf Process Syst 23:2613–2621
Google Scholar
Klapp MA, Erera AL, Toriello A (2018) The one-dimensional dynamic dispatch waves problem. Transp Sci 52(2):402–415
Article Google Scholar
Li C, Li Y, Zhao Y, Peng P, Sler XG (2021) Self-generated long-term experience replay for continual reinforcement learning. Appl Intell 51(1):185–201
Article Google Scholar
Li H, Li Z, Li C, Wang R, Mu R (2020) Research on optimization of electric vehicle routing problem with time window. IEEE Access 8:146707–146718
Article Google Scholar
Li M, Qin Z, Jiao Y, Yang Y, Wang J, Wang C, Guobin W, Ye J (2019) Efficient ridesharing order dispatching with mean field multi-agent reinforcement learning. In: The world wide web conference, pp 983–994
Liu S, He L, Max Shen Z-J (2020) On-time last-mile delivery: Order assignment with travel-time predictors. Manag Sci
Lopez PA, Behrisch M, Bieker-Walz L, Erdmann J, Flötteröd Y-P, Hilbrich R, Lücken L., Rummel J, Wagner P, Wießner E. (2018) Microscopic traffic simulation using sumo. In: The 21st IEEE international conference on intelligent transportation systems. IEEE
Mao C, Liu Y, Shen Z-JM (2020) Dispatch of autonomous vehicles for taxi services: A deep reinforcement learning approach. Trans Res Part C Emerg Technol 115:102626
Article Google Scholar
Marinakis Y, Marinaki M, Migdalas A (2019) A multi-adaptive particle swarm optimization for the vehicle routing problem with time windows. Inform Sci 481:311–329
Article Google Scholar
Mehrjerdi YZ, Shafiee M (2021) A resilient and sustainable closed-loop supply chain using multiple sourcing and information sharing strategies. J Clean Prod 289:125141
Article Google Scholar
meituan.com (2019) The research report for the industry of food delivery service in China 2019. https://mri.meituan.com/institute
Pan J, Wang X, Cheng Y, Qiang Y (2018) Multisource transfer double dqn based on actor learning. IEEE Trans Neural Netw Learn Syst 29(6):2227–2238
Article Google Scholar
Plinere D, Aleksejeva L (2019) Production scheduling in agent-based supply chain for manufacturing efficiency improvement. Procedia Comput Sci 149:36–43
Article Google Scholar
Qiu M, Zhuo F, Eglese R, Tang Q (2018) A tabu search algorithm for the vehicle routing problem with discrete split deliveries and pickups. Comput Oper Res 100:102–116
Article MathSciNet Google Scholar
Ruiz E, Soto-Mendoza V, Barbosa AER, Reyes R (2019) Solving the open vehicle routing problem with capacity and distance constraints with a biased random key genetic algorithm. Comput Ind Eng 133:207–219
Article Google Scholar
Saeedi S (2018) Integrating macro and micro scale approaches in the agent-based modeling of residential dynamics. Int J Appl Earth Obs Geoinf 68:214–229
Article Google Scholar
Ricardo S., Marques A, Amorim P, Rasinmäki J. (2019) Multiple vehicle synchronisation in a full truck-load pickup and delivery problem: A case-study in the biomass supply chain. Eur J Oper Res 277 (1):174–194
Article MathSciNet Google Scholar
Swaminathan JM, Smith SF, Sadeh NM (2007) Modeling supply chain dynamics: A multiagent approach. Decis Sci 29(3):607–632
Article Google Scholar
Hasselt HV, Guez A, Silver D (2016) Deep reinforcement learning with double q-learning. In: Proceedings of the AAAI conference on artificial intelligence, vol 30
Watkins CJCH, Dayan P (1992) Q-learning. Mach Learn 8(3-4):279–292
Article Google Scholar
Zhao X, Ding S, An Y, Jia W (2019) Applications of asynchronous deep reinforcement learning based on dynamic updating weights. Appl Intell 49(2):581–591
Article Google Scholar

Download references

Acknowledgments

This research is supported by National Natural Science Foundation of China (Grant No. 71771035, 71831003).

Author information

Authors and Affiliations

Dalian University of Technology, Dalian, China
Guangyu Zou & Xiangyu Kong
Dongbei University of Finance & Economics, Dalian, China
Jiafu Tang
Auburn University, Auburn, AL, USA
Levent Yilmaz

Authors

Guangyu Zou
View author publications
You can also search for this author in PubMed Google Scholar
Jiafu Tang
View author publications
You can also search for this author in PubMed Google Scholar
Levent Yilmaz
View author publications
You can also search for this author in PubMed Google Scholar
Xiangyu Kong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guangyu Zou.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zou, G., Tang, J., Yilmaz, L. et al. Online food ordering delivery strategies based on deep reinforcement learning. Appl Intell 52, 6853–6865 (2022). https://doi.org/10.1007/s10489-021-02750-3

Download citation

Accepted: 05 August 2021
Published: 17 September 2021
Issue Date: April 2022
DOI: https://doi.org/10.1007/s10489-021-02750-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Online food ordering delivery strategies based on deep reinforcement learning

Abstract

Access this article

Similar content being viewed by others

Order dispatching for an ultra-fast delivery service via deep reinforcement learning

Solving two-stage stochastic route-planning problem in milliseconds via end-to-end deep learning

Reinforcement Learning Methods for Operations Research Applications: The Order Release Problem

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Online food ordering delivery strategies based on deep reinforcement learning

Abstract

Access this article

Similar content being viewed by others

Order dispatching for an ultra-fast delivery service via deep reinforcement learning

Solving two-stage stochastic route-planning problem in milliseconds via end-to-end deep learning

Reinforcement Learning Methods for Operations Research Applications: The Order Release Problem

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation