Skip to main content

Advertisement

Log in

Deep reinforcement learning applied to an assembly sequence planning problem with user preferences

  • ORIGINAL ARTICLE
  • Published:
The International Journal of Advanced Manufacturing Technology Aims and scope Submit manuscript

Abstract

Deep reinforcement learning (DRL) has demonstrated its potential in solving complex manufacturing decision-making problems, especially in a context where the system learns over time with actual operation in the absence of training data. One interesting and challenging application for such methods is the assembly sequence planning (ASP) problem. In this paper, we propose an approach to the implementation of DRL methods in ASP. The proposed approach introduces in the RL environment parametric actions to improve training time and sample efficiency and uses two different reward signals: (1) user’s preferences and (2) total assembly time duration. The user’s preferences signal addresses the difficulties and non-ergonomic properties of the assembly faced by the human and the total assembly time signal enforces the optimization of the assembly. Three of the most powerful deep RL methods were studied, Advantage Actor-Critic (A2C), Deep Q-Learning (DQN), and Rainbow, in two different scenarios: a stochastic and a deterministic one. Finally, the performance of the DRL algorithms was compared to tabular Q-Learnings performance. After 10,000 episodes, the system achieved near optimal behaviour for the algorithms tabular Q-Learning, A2C, and Rainbow. Though, for more complex scenarios, the algorithm tabular Q-Learning is expected to underperform in comparison to the other 2 algorithms. The results support the potential for the application of deep reinforcement learning in assembly sequence planning problems with human interaction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data availability

There are no data available, other than those reported in the document.

Code availability

There is no code available.

References

  1. Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing Atari with deep reinforcement learning

  2. Silver D, Huang A, Maddison CJ, Guez A, Sifre L, van den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M, Dieleman S, Grewe D, Nham J, Kalchbrenner N, Sutskever I, Lillicrap T, Leach M, Kavukcuoglu K, Graepel T, Hassabis D (2016) Mastering the game of Go with deep neural networks and tree search. Nature 529(7587):484–489

    Article  Google Scholar 

  3. Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A, Chen Y, Lillicrap T, Hui F, Sifre L, van den Driessche G, Graepel T, Hassabis D (2017) Mastering the game of Go without human knowledge. Nature 550(7676):354–359

    Article  Google Scholar 

  4. OpenAI, Berner C, Brockman G, Chan B, Cheung V, Dȩbiak P, Dennison C, Farhi D, Fischer Q, Hashme S, Hesse C, Józefowicz R, Gray S, Olsson C, Pachocki J, Petrov M, Pinto HPDO, Raiman J, Salimans T, Schlatter J, Schneider J, Sidor S, Sutskever I, Tang J, Wolski F, Zhang S (2019) Dota 2 with large scale deep reinforcement learning

  5. Won DO, Müller KR, Lee SW (2020) An adaptive deep reinforcement learning framework enables curling robots with human-like performance in real-world conditions. Sci Robot 5(46)

  6. Weichert D, Link P, Stoll A, Rüping S, Ihlenfeldt S, Wrobel S (2019) A review of machine learning for the optimization of production processes. Int J Adv Manuf Technol 104(5–8):1889–1902

    Article  Google Scholar 

  7. Ghadirzadeh A, Chen X, Yin W, Yi Z, Bjorkman M, Kragic D (2021) Human-centered collaborative robots with deep reinforcement learning. IEEE Robot Autom Lett 6(2):566–571

    Article  Google Scholar 

  8. Kshirsagar A, Hoffman G, Biess A (2021) Evaluating guided policy search for human-robot handovers. IEEE Robot Autom Lett 6(2):3933–3940

    Article  Google Scholar 

  9. Varier VM, Rajamani DK, Goldfarb N, Tavakkolmoghaddam F, Munawar A, Fischer GS (2020) Collaborative suturing: a reinforcement learning approach to automate hand-off task in suturing for surgical robots. In 2020 29th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN) pages 1380–1386. IEEE

  10. Oliff H, Liu Y, Kumar M, Williams M, Ryan M (2020) Reinforcement learning for facilitating human-robot-interaction in manufacturing. J Manuf Syst 56:326–340

  11. Zhang R, Lv Q, Li J, Bao J, Liu T, Liu S (2022) A reinforcement learning method for human-robot collaboration in assembly tasks. Robot Comput Integr Manuf 73:102227

  12. Yu T, Huang J, Chang Q (2021) Optimizing task scheduling in human-robot collaboration with deep multi-agent reinforcement learning. J Manuf Syst 60:487–499

    Article  Google Scholar 

  13. Buerkle A, Matharu H, Al-Yacoub A, Lohse N, Bamber T, Ferreira P (2022) An adaptive human sensor framework for human-robot collaboration. Int J Adv Manuf Technol 119(1–2):1233–1248

    Article  Google Scholar 

  14. Liu Z, Liu Q, Wang L, Xu Zhou Z (2021) Task-level decision-making for dynamic and stochastic human-robot collaboration based on dual agents deep reinforcement learning. Int J Adv Manuf Technol 115(11–12):3533–3552

    Article  Google Scholar 

  15. Ying KC, Pourhejazy P, Cheng CY, Wang CH (2021) Cyber-physical assembly system-based optimization for robotic assembly sequence planning. J Manuf Syst 58:452–466

    Article  Google Scholar 

  16. Watanabe K, Inada S (2020) Search algorithm of the assembly sequence of products by using past learning results. Int J Prod Econ 226:107615

    Article  Google Scholar 

  17. Mao H, Liu Z, Qiu C (2021) Adaptive disassembly sequence planning for VR maintenance training via deep reinforcement learning. Int J Adv Manuf Technol

  18. Wang X, Zhang L, Lin T, Zhao C, Wang K, Chen Z (2022) Solving job scheduling problems in a resource preemption environment with multi-agent reinforcement learning. Robot Comput Integr Manuf 77:102324

  19. Zhang R, Torabi F, Guan L, Ballard DH, Stone P (2019) Leveraging human guidance for deep reinforcement learning tasks

  20. Zhan H, Tao F, Cao Y (2021) Human-guided robot behavior learning: a GAN-assisted preference-based reinforcement learning approach. IEEE Robot Autom Lett 6(2):3545–3552

    Article  Google Scholar 

  21. Mnih V, Badia AP, Mirza M, Graves A, Lillicrap TP, Harley T, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning

  22. Williams RJ (1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach Learn 8(3–4):229–256

    Article  Google Scholar 

  23. Hessel M, Modayil J, van Hasselt H, Schaul T, Ostrovski G, Dabney W, Horgan D, Piot B, Azar M, Silver D (2017) Rainbow: combining improvements in deep reinforcement learning

  24. Neves M, Vieira M, Neto P (2021) A study on a Q-Learning algorithm application to a manufacturing assembly problem. J Manuf Syst 59:426–440

    Article  Google Scholar 

  25. Calli B, Singh A, Walsman A, Srinivasa S, Abbeel P, Dollar AM (2015) The YCB object and model set: towards common benchmarks for manipulation research. In 2015 International Conference on Advanced Robotics (ICAR), pages 510–517

  26. Calli B, Walsman A, Singh A, Srinivasa S, Abbeel P, Dollar AM (2015) Benchmarking in manipulation research: using the Yale-CMU-Berkeley object and model set. IEEE Robot Autom Mag 22(3):36–52

    Article  Google Scholar 

  27. Liang E, Liaw R, Moritz P, Nishihara R, Fox R, Goldberg K, Gonzalez JE, Jordan MI, Stoica I (2017) RLlib: abstractions for distributed reinforcement learning

  28. Watkins CJCH (1989) Learning from delayed rewards. PhD thesis, King’s College, 1989

  29. Henderson P, Islam R, Bachman P, Pineau J, Precup D, Meger D (2017) Deep reinforcement learning that matters

Download references

Funding

This research was partially supported by project PRODUTECH4S&C (46102) by UE/FEDER through the program COMPETE 2020 and the Portuguese Foundation for Science and Technology (FCT): COBOTIS (PTDC/EME-EME/32595/2017) and UIDB/00285/2020.

Author information

Authors and Affiliations

Authors

Contributions

Miguel Neves implemented the methods and conducted the testing. Pedro Neto defined the initial approach and managed the experimental tests.

Corresponding author

Correspondence to Pedro Neto.

Ethics declarations

Ethics approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Neves, M., Neto, P. Deep reinforcement learning applied to an assembly sequence planning problem with user preferences. Int J Adv Manuf Technol 122, 4235–4245 (2022). https://doi.org/10.1007/s00170-022-09877-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00170-022-09877-8

Keywords

Navigation