Learning practically feasible policies for online 3D bin packing

Zhao, Hang; Zhu, Chenyang; Xu, Xin; Huang, Hui; Xu, Kai

doi:10.1007/s11432-021-3348-6

Learning practically feasible policies for online 3D bin packing

Research Paper
Published: 27 December 2021

Volume 65, article number 112105, (2022)
Cite this article

Science China Information Sciences Aims and scope Submit manuscript

Hang Zhao¹^na1,
Chenyang Zhu¹^na1,
Xin Xu²,
Hui Huang³ &
…
Kai Xu¹

857 Accesses
44 Citations
Explore all metrics

Abstract

We tackle the online 3D bin packing problem (3D-BPP), a challenging yet practically useful variant of the classical bin packing problem. In this problem, the items are delivered to the agent without informing the full sequence information. The agent must directly pack these items into the target bin stably without changing their arrival order, and no further adjustment is permitted. Online 3D-BPP can be naturally formulated as a Markov decision process (MDP). We adopt deep reinforcement learning, in particular, the on-policy actor-critic framework, to solve this MDP with constrained action space. To learn a practically feasible packing policy, we propose three critical designs. First, we propose an online analysis of packing stability based on a novel stacking tree. It attains a high analysis accuracy while reducing the computational complexity from O(N²) to O(N log N), making it especially suited for reinforcement learning training. Second, we propose a decoupled packing policy learning for different dimensions of placement which enables high-resolution spatial discretization and hence high packing precision. Third, we introduce a reward function that dictates the robot to place items in a far-to-near order and therefore simplifies the collision avoidance in movement planning of the robotic arm. Furthermore, we provide a comprehensive discussion on several key implemental issues. The extensive evaluation demonstrates that our learned policy outperforms the state-of-the-art methods significantly and is practically usable for real-world applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Artificial intelligence-based inventory management: a Monte Carlo tree search approach

Article Open access 19 April 2021

Molecular de-novo design through deep reinforcement learning

Article Open access 04 September 2017

Permutation flow shop scheduling with makespan objective and truncated learning effects

Article 17 April 2024

References

Korte B, Vygen J. Bin-packing. In: Kombinatorische Optimierung. Berlin: Springer, 2012. 499–516
Chapter MATH Google Scholar
Martello S, Pisinger D, Vigo D. The three-dimensional bin packing problem. Oper Res, 2000, 48: 256–267
Article MathSciNet MATH Google Scholar
Crainic T G, Perboli G, Tadei R. Extreme point-based heuristics for three-dimensional bin packing. Informs J Comput, 2008, 20: 368–384
Article MathSciNet MATH Google Scholar
Karabulut K, İnceoğlu M M. A hybrid genetic algorithm for packing in 3D with deepest bottom left with fill method. In: Proceedings of International Conference on Advances in Information Systems, 2004. 441–450
Zhao H, She Q, Zhu C, et al. Online 3D bin packing with constrained deep reinforcement learning. In: Proceedings of the 35th AAAI Conference on Artificial Intelligence, the 33rd Conference on Innovative Applications of Artificial Intelligence, the 11th Symposium on Educational Advances in Artificial Intelligence, 2021. 741–749
Altman E. Constrained Markov Decision Processes. Boca Raton: CRC Press, 1999
MATH Google Scholar
Mnih V, Badia A P, Mirza M, et al. Asynchronous methods for deep reinforcement learning. In: Proceedings of International Conference on Machine Learning, 2016. 1928–1937
Wu Y, Mansimov E, Grosse R B, et al. Scalable trust-region method for deep reinforcement learning using kronecker-factored approximation. In: Proceedings of Advances in Neural Information Processing Systems, 2017. 5279–5288
Kantorovich L V. Mathematical methods of organizing and planning production. Manage Sci, 1960, 6: 366–422
Article MathSciNet MATH Google Scholar
Coffman E G, Garey M R, Johnson D S. Approximation algorithms for bin-packing—an updated survey. In: Proceedings of Algorithm Design for Computer System Design, 1984. 49–106
Faroe O, Pisinger D, Zachariasen M. Guided local search for the three-dimensional bin-packing problem. Informs J Comput, 2003, 15: 267–283
Article MathSciNet MATH Google Scholar
de Castro S J L, Soma N Y, Maculan N. A greedy search for the three-dimensional bin packing problem: the packing static stability case. Int Trans Oper Res, 2003, 10: 141–153
Article MathSciNet MATH Google Scholar
Lodi A, Martello S, Vigo D. Approximation algorithms for the oriented two-dimensional bin packing problem. Eur J Oper Res, 1999, 112: 158–166
Article MATH Google Scholar
Crainic T G, Perboli G, Tadei R. TS2PACK: a two-level tabu search for the three-dimensional bin packing problem. Eur J Oper Res, 2009, 195: 744–760
Article MATH Google Scholar
Li X, Zhao Z, Zhang K. A genetic algorithm for the three-dimensional bin packing problem with heterogeneous bins. In: Proceedings of Industrial and Systems Engineering Research Conference, 2014. 2039
Takahara S, Miyamoto S. An evolutionary approach for the multiple container loading problem. In: Proceedings of the 5th International Conference on Hybrid Intelligent Systems, 2005. 227–232
Ha C T, Nguyen T T, Bui L T, et al. An online packing heuristic for the three-dimensional container loading problem in dynamic environments and the physical internet. In: Proceedings of European Conference on the Applications of Evolutionary Computation, 2017. 140–155
Wang R, Nguyen T T, Kavakeb S, et al. Benchmarking dynamic three-dimensional bin packing problems using discrete-event simulation. In: Proceedings of European Conference on the Applications of Evolutionary Computation, 2016. 266–279
Hong Y D, Kim Y J, Lee K B. Smart pack: online autonomous object-packing system using RGB-D sensor data. Sensors, 2020, 20: 4448
Article Google Scholar
Erleben K. Velocity-based shock propagation for multibody dynamics animation. ACM Trans Graph, 2007, 26: 12
Article Google Scholar
Thomsen K K, Kraus M. Simulating small-scale object stacking using stack stability. In: Proceedings of the 23rd International Conference in Central Europe on Computer Graphics, Visualization and Computer Vision WSCG 2015. Plzen: Vaclav Skala-UNION Agency, 2015. 5–8
Hsu S W, Keyser J. Automated constraint placement to maintain pile shape. ACM Trans Graph, 2012, 31: 1–6
Article Google Scholar
Han D, Hsu S W, McNamara A, et al. Believability in simplifications of large scale physically based simulation. In: Proceedings of the ACM Symposium on Applied Perception, 2013. 99–106
Lillicrap T P, Hunt J J, Pritzel A, et al. Continuous control with deep reinforcement learning. 2015. ArXiv:1509.02971
Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning. Nature, 2015, 518: 529–533
Article Google Scholar
Wang Z, Schaul T, Hessel M, et al. Dueling network architectures for deep reinforcement learning. 2015. ArXiv:1511.06581
Silver D, Lever G, Heess N, et al. Deterministic policy gradient algorithms. In: Proceedings of the 31st International Conference on International Conference on Machine Learning, 2014. 387–395
Barth-Maron G, Hoffman M W, Budden D, et al. Distributed distributional deterministic policy gradients. 2018. ArXiv:1804.08617
Schulman J, Wolski F, Dhariwal P, et al. Proximal policy optimization algorithms. 2017. ArXiv:1707.06347
Bello I, Pham H, Le Q V, et al. Neural combinatorial optimization with reinforcement learning. 2016. ArXiv:1611.09940
Kool W, van Hoof H, Welling M. Attention, learn to solve routing problems! In: Proceedings of the 7th International Conference on Learning Representations, 2019
Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. In: Proceedings of Annual Conference on Neural Information Processing Systems, Long Beach, 2017. 5998–6008
Zhang C, Song W, Cao Z, et al. Learning to dispatch for job shop scheduling via deep reinforcement learning. In: Proceedings of Annual Conference on Neural Information Processing Systems, 2020
Wang H, Liang W, Yu L F. Scene mover: automatic move planning for scene arrangement by deep reinforcement learning. ACM Trans Graph, 2020, 39: 1–15
Google Scholar
Hu H, Zhang X, Yan X, et al. Solving a new 3D bin packing problem with deep reinforcement learning method. 2017. ArXiv:1708.05930
Laterre A, Fu Y, Jabri M K, et al. Ranked reward: enabling self-play reinforcement learning for combinatorial optimization. 2018. ArXiv:1807.01672
Uchibe E, Doya K. Constrained reinforcement learning from intrinsic and extrinsic rewards. In: Proceedings of the 6th International Conference on Development and Learning, 2007. 163–168
Chow Y, Ghavamzadeh M, Janson L, et al. Risk-constrained reinforcement learning with percentile risk criteria. J Mach Learn Res, 2017, 18: 6070–6120
MathSciNet MATH Google Scholar
Achiam J, Held D, Tamar A, et al. Constrained policy optimization. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, 2017. 22–31
Martens J, Grosse R. Optimizing neural networks with Kronecker-factored approximate curvature. In: Proceedings of International Conference on Machine Learning, 2015. 2408–2417
Haarnoja T, Zhou A, Abbeel P, et al. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. 2018. ArXiv:1801.01290
Martello S, Pisinger D, Vigo D, et al. Algorithm 864: general and robot-packable variants of the three-dimensional bin packing problem. ACM Trans Math Softw, 2007, 33: 7
Article Google Scholar
Tavakoli A, Pardo F, Kormushev P. Action branching architectures for deep reinforcement learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, 2018
Silver D, Schrittwieser J, Simonyan K, et al. Mastering the game of Go without human knowledge. Nature, 2017, 550: 354–359
Article Google Scholar
Chaslot G M B, Winands M H, van den Herik H J. Parallel Monte-Carlo tree search. In: Proceedings of International Conference on Computers and Games, 2008. 60–71
Dekel A, Harenstam-Nielsen L, Caccamo S. Optimal least-squares solution to the hand-eye calibration problem. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020. 13598–13606
Feng C, Taguchi Y, Kamat V R. Fast plane extraction in organized point clouds using agglomerative hierarchical clustering. In: Proceedings of IEEE International Conference on Robotics and Automation (ICRA), 2014. 6218–6225
Paszke A, Gross S, Massa F, et al. PyTorch: an imperative style, high-performance deep learning library. In: Proceedings of Advances in Neural Information Processing Systems, 2019. 8024–8035
Gilmore P C, Gomory R E. A linear programming approach to the cutting-stock problem. Oper Res, 1961, 9: 849–859
Article MathSciNet MATH Google Scholar
Coumans E. Bullet physics simulation. In: Proceedings of ACM SIGGRAPH 2015 Courses, 2015

Download references

Acknowledgements

This work was supported in part by National Key Research and Development Program of China (Grant No. 2018AAA0102200), National Natural Science Foundation of China (Grant Nos. 62132021, 61825305, 62002375, 62002376, 62102435), NUDT Research Grants (Grant No. ZK19-30), DEGP Key Project (Grant No. 2018KZDXM058), GD Science and Technology Program (Grant No. 2020A0505100064), and Shenzhen Science and Technology Program (Grant No. JCYJ20210324120213036). We thank Qijin SHE, Yin YANG, Kun HUANG, Yixing LAN, Kaiwen LI, Junkai REN, and Yao DUAN for active discussion. We also thank Hanchi HUANG for maintaining a good community to communicate reinforcement learning related technologies.

Author information

Zhao H and Zhu C Y have the same contribution to this work.

Authors and Affiliations

School of Computer Science, National University of Defense Technology, Changsha, 410073, China
Hang Zhao, Chenyang Zhu & Kai Xu
College of Intelligence Science and Technology, National University of Defense Technology, Changsha, 410073, China
Xin Xu
College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, 518060, China
Hui Huang

Authors

Hang Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Chenyang Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Xin Xu
View author publications
You can also search for this author in PubMed Google Scholar
Hui Huang
View author publications
You can also search for this author in PubMed Google Scholar
Kai Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kai Xu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhao, H., Zhu, C., Xu, X. et al. Learning practically feasible policies for online 3D bin packing. Sci. China Inf. Sci. 65, 112105 (2022). https://doi.org/10.1007/s11432-021-3348-6

Download citation

Received: 08 July 2021
Revised: 20 August 2021
Accepted: 22 September 2021
Published: 27 December 2021
DOI: https://doi.org/10.1007/s11432-021-3348-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning practically feasible policies for online 3D bin packing

Abstract

Access this article

Similar content being viewed by others

Artificial intelligence-based inventory management: a Monte Carlo tree search approach

Molecular de-novo design through deep reinforcement learning

Permutation flow shop scheduling with makespan objective and truncated learning effects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Learning practically feasible policies for online 3D bin packing

Abstract

Access this article

Similar content being viewed by others

Artificial intelligence-based inventory management: a Monte Carlo tree search approach

Molecular de-novo design through deep reinforcement learning

Permutation flow shop scheduling with makespan objective and truncated learning effects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation