Abstract
Robotics has been booming in recent years. Especially with the development of artificial intelligence, more and more researchers have devoted themselves to the field of robotics, but there are still many shortcomings in the multi-task operation of robots. Reinforcement learning has achieved good performance in manipulator manipulation, especially in grasping, but grasping is only the first step for the robot to perform actions, and it often ignores the stacking, assembly, placement, and other tasks to be carried out later. Such long-horizon tasks still face the problems of expensive time, dead-end exploration, and process reversal. Hierarchical reinforcement learning has some advantages in solving the above problems, but not all tasks can be learned hierarchically. This paper mainly solves the complex manipulation task of continuous multi-action of the manipulator by improving the method of hierarchical reinforcement learning, aiming to solve the task of long sequences such as stacking and alignment by proposing a framework. Our framework completes simulation experiments on various tasks and improves the success rate from 78.3% to 94.8% when cleaning cluttered toys. In the stacking toy experiment, the training speed is nearly three times faster than the baseline method. And our method can be generalized to other long-horizon tasks. Experiments show that the more complex the task, the greater the advantage of our framework.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
Availability of data and material
Availability of data is temporarily not allowed by the authors.
Code availability
Code availability is temporarily not allowed by the authors. In case of publication, the authors will post the code, manuscript, and related video files on the github page, In real environment test video released on Youtube at: https://youtu.be/KHty3w9O7d8.
References
Berscheid, L., Meißner, P., Kröger, T.: Robot learning of shifting objects for grasping in cluttered environments. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 612–618. IEEE (2019)
Pinto, L., Gupta, A.: Supersizing self-supervision: Learning to grasp from 50k tries and 700 robot hours. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 3406–3413. IEEE (2016)
Gualtieri, M., ten Pas, A., Platt, R.: Pick and place without geometric object models. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 7433–7440. IEEE (2018)
Joshi, S., Kumra, S., Sahin, F.: Robotic grasping using deep reinforcement learning. In: 2020 IEEE 16th International Conference on Automation Science and Engineering (CASE), pp. 1461–1466. IEEE(2020)
Lu, Q., Chenna, K., Sundaralingam, B., Hermans, T.: Planning multi-fingered grasps as probabilistic inference in a learned deep network. In: Robotics Research, pp. 455–472. Springer (2020)
Mousavian, A., Eppner, C., Fox, D.: 6-dof graspnet: Variational grasp generation for object manipulation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2901–2910 (2019)
Guo, W., Wang, C., Fu, Y., Zha, F.: Deep reinforcement learning algorithm for object placement tasks with manipulator. In: 2018 IEEE International Conference on Intelligence and Safety for Robotics (ISR), pp. 608–613. IEEE (2018)
Deng, Y., Guo, X., Wei, Y., Lu, K., Fang, B., Guo, D., Liu, H., Sun, F.: Deep reinforcement learning for robotic pushing and picking in cluttered environment. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 619–626. IEEE (2019)
Wu, B., Akinola, I., Allen, P.K.: Pixel-attentive policy gradient for multi-fingered grasping in cluttered scenes. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1789–1796. IEEE (2019)
Yan, M., Zhu, Y., Jin, N., Bohg, J.: Self-supervised learning of state estimation for manipulating deformable linear objects. IEEE Robot. Autom. Lett. 5(2), 2372–2379 (2020)
Wang, X., Jiang, X., Zhao, J., Wang, S., Liu, Y.-H.: Grasping objects mixed with towels. IEEE Access 8, 129338–129346 (2020)
Zhang, J., Zhang, W., Song, R., Ma, L., Li, Y.: Grasp for stacking via deep reinforcement learning. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 2543–2549. IEEE (2020)
Gualtieri, M., Platt, R.: Learning 6-dof grasping and pick-place using attention focus. In: Conference on Robot Learning, pp. 477–486. PMLR (2018)
Kalashnikov, D., Irpan, A., Pastor, P., Ibarz, J., Herzog, A., Jang, E., Quillen, D., Holly, E., Kalakrishnan, M., Vanhoucke, V., et al.: QT-Opt: Scalable deep reinforcement learning for vision- based robotic manipulation. arXiv:1806.10293 (2018)
Zhao, T., Deng, M., Li, Z., Hu, Y.: Cooperative manipulation for a mobile dual-arm robot using sequences of dynamic movement primitives. IEEE Trans. Cogn. Dev. Syst. 12(1), 18–29 (2018)
Krishnan, S., Garg, A., Liaw, R., Thananjeyan, B., Miller, L., Pokorny, F.T., Goldberg, K.: SWIRL: A sequential windowed inverse reinforcement learning algorithm for robot tasks with delayed rewards. Int. J. Robot. Res. 38(2–3), 126–145 (2019)
Bhagat, S., Banerjee, H., Ho Tse, Z.T., Ren, H.: Deep reinforcement learning for soft, flexible robots: Brief review with impending challenges. Robotics 8(1), 4 (2019)
Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv:1509.02971 (2015)
Heess, N., TB, D., Sriram, S., Lemmon, J., Merel, J., Wayne, G., Tassa, Y., Erez, T., Wang, Z., Eslami, S., et al.: Emergence of locomotion behaviours in rich environments. arXiv:1707.02286 (2017)
Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897. PMLR (2015)
Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., Kavukcuoglu, K.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1928–1937. PMLR (2016)
Al-Shanoon, A., Lang, H.: Learn to grasp unknown-adjacent objects for sequential robotic manipulation. J. Intell. Robot. Syst. 105(4), 83 (2022)
Pathak, D., Agrawal, P., Efros, A.A., Darrell, T.: Curiosity-driven exploration by self-supervised prediction. In: International Conference on Machine Learning, pp. 2778–2787. PMLR (2017)
Pathak, D., Gandhi, D., Gupta, A.: Self-supervised exploration via disagreement. In: International Conference on Machine Learning, pp. 5062–5071. PMLR (2019)
Eysenbach, B., Gupta, A., Ibarz, J., Levine, S.: Diversity is all you need: Learning skills without a reward function. arXiv:1802.06070 (2018)
Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv:1907.01657 (2019)
Bagaria, A., Konidaris, G.: Option discovery using deep skill chaining. In: International Conference on Learning Representations (2020)
Smith, M., Hoof, H., Pineau, J.: An inference-based policy gradient method for learning options. In: International Conference on Machine Learning, pp. 4703–4712. PMLR (2018)
Gupta, A., Kumar, V., Lynch, C., Levine, S., Hausman, K.: Relay policy learning: Solving long-horizon tasks via imitation and reinforcement learning. arXiv:1910.11956 (2019)
Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 6292–6299. IEEE (2018)
Rajeswaran, A., Kumar, V., Gupta, A., Vezzani, G., Schulman, J., Todorov, E., Levine, S.: Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. arXiv:1709.10087 (2017)
Fu, J., Kumar, A., Nachum, O., Tucker, G., Levine, S.: D4rl: Datasets for deep data-driven reinforcement learning. arXiv:2004.07219 (2020)
Fujimoto, S., Meger, D., Precup, D.: Off-policy deep reinforcement learning without exploration. In: International Conference on Machine Learning, pp. 2052–2062. PMLR (2018)
Mandlekar, A., Ramos, F., Boots, B., Savarese, S., Fei-Fei, L., Garg, A., Fox, D.: Iris: Implicit reinforcement without interaction at scale for learning control from offline robot manipulation data. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 4414–4420. IEEE (2020)
Ajay, A., Kumar, A., Agrawal, P., Levine, S., Nachum, O.: Opal: Offline primitive discovery for accelerating offline reinforcement learning. arXiv:2010.13611 (2020)
Allshire, A., Martín-Martín, R., Lin, C., Manuel, S., Savarese, S., Garg, A.: Laser: Learning a latent action space for efficient reinforcement learning. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 6650–6656. IEEE (2021)
Collewet, C., Marchand, E.: Photometric visual servoing. IEEE. Trans. Robot. 27(4), 828–834 (2011)
Crombez, N., Caron, G., Mouaddib, E.M.: Photometric Gaussian mixtures based visual servoing. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5486–5491. IEEE (2015)
Chaumette, F.: A first step toward visual servoing using image moments. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, vol. 1, pp. 378–383. IEEE (2002)
Ebert, F., Dasari, S., Lee, A.X., Levine, S., Finn, C.: Robustness via retrying: Closed-loop robotic manipulation with self-supervised learning. In: Conference on Robot Learning, pp. 983–993. PMLR (2018)
Yen-Chen, L., Zeng, A., Song, S., Isola, P., Lin, T.-Y.: Learning to see before learning to act: Visual pre-training for manipulation. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 7286–7293. IEEE (2020)
Song, S., Zeng, A., Lee, J., Funkhouser, T.: Grasping in the wild: Learning 6dof closed-loop grasping from low-cost demonstrations. IEEE Robot. Autom. Lett. 5(3), 4978–4985 (2020)
Sarantopoulos, I., Kiatos, M., Doulgeri, Z., Malassiotis, S.: Split deep q-learning for robust object singulation. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 6225–6231. IEEE (2020)
Zeng, A., Song, S., Welker, S., Lee, J., Rodriguez, A., Funkhouser, T.: Learning synergies between pushing and grasping with self-supervised deep reinforcement learning. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4238–4245. IEEE (2018)
Martinez, A.D., Del Ser, J., Osaba, E., Herrera, F.: Adaptive multi-factorial evolutionary optimization for multi-task reinforcement learning. IEEE Trans. Evol. Comput. (2021)
Nachum, O., Gu, S.S., Lee, H., Levine, S.: Data-efficient hierarchical reinforcement learning. Adv. Neural Inf. Process. Syst.31 (2018)
Eppe, M., Gumbsch, C., Kerzel, M., Nguyen, P.D., Butz, M.V., Wermter, S.: Intelligent problem-solving as integrated hierarchical reinforcement learning. Nat. Mach. Intell. 4(1), 11–20 (2022)
Wang, Z., Lu, J., Tao, C., Zhou, J., Tian, Q.: Learning channel-wise interactions for binary convolutional neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 568–577 (2019)
Wang, X., Chen, W., Wu, J., Wang, Y.-F., Wang, W.Y.: Video captioning via hierarchical reinforcement learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4213–4222 (2018)
Zhao, D., Ma, Y., Jiang, Z., Shi, Z.: Multiresolution airport detection via hierarchical reinforcement learning saliency model. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 10(6), 2855–2866 (2017)
Kim, J., Seo, Y., Shin, J.: Landmark-guided subgoal generation in hierarchical reinforcement learning. Adv. Neural Inf. Process. Syst. 34, 28336–28349 (2021)
Lampinen, A., Chan, S., Banino, A., Hill, F.: Towards mental time travel: A hierarchical memory for reinforcement learning agents. Adv. Neural Inf. Process. Syst. 34, 28182–28195 (2021)
Kulkarni, T.D., Narasimhan, K., Saeedi, A., Tenenbaum, J.: Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation. Adv. Neural Inf. Process. Syst. 29, 3675–3683 (2016)
Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320. PMLR (2015)
Vezhnevets, A.S., Osindero, S., Schaul, T., Heess, N., Jaderberg, M., Silver, D., Kavukcuoglu, K.: Feudal networks for hierarchical reinforcement learning. In: International Conference on Machine Learning, pp. 3540–3549. PMLR (2017)
Bacon, P.-L., Harb, J., Precup, D.: The option-critic architecture. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31 (2017)
Andrychowicz, M., Wolski, F., Ray, A., Schneider, J., Fong, R., Welinder, P., McGrew, B., Tobin, J., Abbeel, P., Zaremba, W.: Hindsight experience replay. arXiv:1707.01495 (2017)
Beyret, B., Shafti, A., Faisal, A.A.: Dot-to-dot: Explainable hierarchical reinforcement learning for robotic manipulation. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5014–5019. IEEE (2019)
Marzari, L., Pore, A., Dall’Alba, D., Aragon-Camarasa, G., Farinelli, A., Fiorini, P.: Towards hierarchical task decomposition using deep reinforcement learning for pick and place subtasks. arXiv:2102.04022 (2021)
Yang, X., Ji, Z., Wu, J., Lai, Y.-K., Wei, C., Liu, G., Setchi, R.: Hierarchical reinforcement learning with universal policies for multistep robotic manipulation. IEEE Trans. Neural Netw. Learn. Syst. 33(9), 4727–4741 (2021)
Gieselmann, R., Pokorny, F.T.: Planning-augmented hierarchical reinforcement learning. IEEE Robot. Autom. Lett. 6(3), 5097–5104 (2021)
Nasiriany, S., Liu, H., Zhu, Y.: Augmenting reinforcement learning with behavior primitives for diverse manipulation tasks. In: 2022 International Conference on Robotics and Automation (ICRA), pp. 7477–7484. IEEE (2022)
Xu, K., Yu, H., Lai, Q., Wang, Y., Xiong, R.: Efficient learning of goal-oriented push-grasping synergy in clutter. arXiv:2103.05405 (2021)
Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pp. 234–241. Springer (2015)
Funding
This work was supported in part by the Foundation of National Natural Science Foundation of China under Grant 62373086, 62373087, Liaoning Revitalization Talents Program under Grant XLYC2203013.
Author information
Authors and Affiliations
Contributions
Fei Wang and Yue Liu conceived the project. Yue Liu and Manyi Shi conducted experiments in simulation environment and collected the test data. Chao Chen and JinBiao Zhu completed the real-world part of the experiment. Fei Wang and Yue Liu analyzed the data and wrote the manuscript. Yue Liu and Shangdong Liu provided valuable comments. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval
This article does not contain any studies with human participants performed by any of the authors.
Consent to participate
Informed consent was obtained from all individual participants included in the study.
Consent for publication
The authors declare that they consent to publication.
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Wang, F., Liu, Y., Shi, M. et al. Efficient Stacking and Grasping in Unstructured Environments. J Intell Robot Syst 110, 57 (2024). https://doi.org/10.1007/s10846-024-02078-3
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10846-024-02078-3