Deep reinforcement learning applied to an assembly sequence planning problem with user preferences

Neves, Miguel; Neto, Pedro

doi:10.1007/s00170-022-09877-8

Deep reinforcement learning applied to an assembly sequence planning problem with user preferences

ORIGINAL ARTICLE
Published: 05 August 2022

Volume 122, pages 4235–4245, (2022)
Cite this article

The International Journal of Advanced Manufacturing Technology Aims and scope Submit manuscript

1146 Accesses
11 Citations
1 Altmetric
Explore all metrics

Abstract

Deep reinforcement learning (DRL) has demonstrated its potential in solving complex manufacturing decision-making problems, especially in a context where the system learns over time with actual operation in the absence of training data. One interesting and challenging application for such methods is the assembly sequence planning (ASP) problem. In this paper, we propose an approach to the implementation of DRL methods in ASP. The proposed approach introduces in the RL environment parametric actions to improve training time and sample efficiency and uses two different reward signals: (1) user’s preferences and (2) total assembly time duration. The user’s preferences signal addresses the difficulties and non-ergonomic properties of the assembly faced by the human and the total assembly time signal enforces the optimization of the assembly. Three of the most powerful deep RL methods were studied, Advantage Actor-Critic (A2C), Deep Q-Learning (DQN), and Rainbow, in two different scenarios: a stochastic and a deterministic one. Finally, the performance of the DRL algorithms was compared to tabular Q-Learnings performance. After 10,000 episodes, the system achieved near optimal behaviour for the algorithms tabular Q-Learning, A2C, and Rainbow. Though, for more complex scenarios, the algorithm tabular Q-Learning is expected to underperform in comparison to the other 2 algorithms. The results support the potential for the application of deep reinforcement learning in assembly sequence planning problems with human interaction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comprehensive literature review of the applications of AI techniques through the lifecycle of industrial equipment

Article Open access 07 December 2023

A practical guide to multi-objective reinforcement learning and planning

Article Open access 13 April 2022

Machine learning and deep learning based predictive quality in manufacturing: a systematic review

Article Open access 28 May 2022

Data availability

There are no data available, other than those reported in the document.

Code availability

There is no code available.

References

Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing Atari with deep reinforcement learning
Silver D, Huang A, Maddison CJ, Guez A, Sifre L, van den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M, Dieleman S, Grewe D, Nham J, Kalchbrenner N, Sutskever I, Lillicrap T, Leach M, Kavukcuoglu K, Graepel T, Hassabis D (2016) Mastering the game of Go with deep neural networks and tree search. Nature 529(7587):484–489
Article Google Scholar
Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A, Chen Y, Lillicrap T, Hui F, Sifre L, van den Driessche G, Graepel T, Hassabis D (2017) Mastering the game of Go without human knowledge. Nature 550(7676):354–359
Article Google Scholar
OpenAI, Berner C, Brockman G, Chan B, Cheung V, Dȩbiak P, Dennison C, Farhi D, Fischer Q, Hashme S, Hesse C, Józefowicz R, Gray S, Olsson C, Pachocki J, Petrov M, Pinto HPDO, Raiman J, Salimans T, Schlatter J, Schneider J, Sidor S, Sutskever I, Tang J, Wolski F, Zhang S (2019) Dota 2 with large scale deep reinforcement learning
Won DO, Müller KR, Lee SW (2020) An adaptive deep reinforcement learning framework enables curling robots with human-like performance in real-world conditions. Sci Robot 5(46)
Weichert D, Link P, Stoll A, Rüping S, Ihlenfeldt S, Wrobel S (2019) A review of machine learning for the optimization of production processes. Int J Adv Manuf Technol 104(5–8):1889–1902
Article Google Scholar
Ghadirzadeh A, Chen X, Yin W, Yi Z, Bjorkman M, Kragic D (2021) Human-centered collaborative robots with deep reinforcement learning. IEEE Robot Autom Lett 6(2):566–571
Article Google Scholar
Kshirsagar A, Hoffman G, Biess A (2021) Evaluating guided policy search for human-robot handovers. IEEE Robot Autom Lett 6(2):3933–3940
Article Google Scholar
Varier VM, Rajamani DK, Goldfarb N, Tavakkolmoghaddam F, Munawar A, Fischer GS (2020) Collaborative suturing: a reinforcement learning approach to automate hand-off task in suturing for surgical robots. In 2020 29th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN) pages 1380–1386. IEEE
Oliff H, Liu Y, Kumar M, Williams M, Ryan M (2020) Reinforcement learning for facilitating human-robot-interaction in manufacturing. J Manuf Syst 56:326–340
Zhang R, Lv Q, Li J, Bao J, Liu T, Liu S (2022) A reinforcement learning method for human-robot collaboration in assembly tasks. Robot Comput Integr Manuf 73:102227
Yu T, Huang J, Chang Q (2021) Optimizing task scheduling in human-robot collaboration with deep multi-agent reinforcement learning. J Manuf Syst 60:487–499
Article Google Scholar
Buerkle A, Matharu H, Al-Yacoub A, Lohse N, Bamber T, Ferreira P (2022) An adaptive human sensor framework for human-robot collaboration. Int J Adv Manuf Technol 119(1–2):1233–1248
Article Google Scholar
Liu Z, Liu Q, Wang L, Xu Zhou Z (2021) Task-level decision-making for dynamic and stochastic human-robot collaboration based on dual agents deep reinforcement learning. Int J Adv Manuf Technol 115(11–12):3533–3552
Article Google Scholar
Ying KC, Pourhejazy P, Cheng CY, Wang CH (2021) Cyber-physical assembly system-based optimization for robotic assembly sequence planning. J Manuf Syst 58:452–466
Article Google Scholar
Watanabe K, Inada S (2020) Search algorithm of the assembly sequence of products by using past learning results. Int J Prod Econ 226:107615
Article Google Scholar
Mao H, Liu Z, Qiu C (2021) Adaptive disassembly sequence planning for VR maintenance training via deep reinforcement learning. Int J Adv Manuf Technol
Wang X, Zhang L, Lin T, Zhao C, Wang K, Chen Z (2022) Solving job scheduling problems in a resource preemption environment with multi-agent reinforcement learning. Robot Comput Integr Manuf 77:102324
Zhang R, Torabi F, Guan L, Ballard DH, Stone P (2019) Leveraging human guidance for deep reinforcement learning tasks
Zhan H, Tao F, Cao Y (2021) Human-guided robot behavior learning: a GAN-assisted preference-based reinforcement learning approach. IEEE Robot Autom Lett 6(2):3545–3552
Article Google Scholar
Mnih V, Badia AP, Mirza M, Graves A, Lillicrap TP, Harley T, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning
Williams RJ (1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach Learn 8(3–4):229–256
Article Google Scholar
Hessel M, Modayil J, van Hasselt H, Schaul T, Ostrovski G, Dabney W, Horgan D, Piot B, Azar M, Silver D (2017) Rainbow: combining improvements in deep reinforcement learning
Neves M, Vieira M, Neto P (2021) A study on a Q-Learning algorithm application to a manufacturing assembly problem. J Manuf Syst 59:426–440
Article Google Scholar
Calli B, Singh A, Walsman A, Srinivasa S, Abbeel P, Dollar AM (2015) The YCB object and model set: towards common benchmarks for manipulation research. In 2015 International Conference on Advanced Robotics (ICAR), pages 510–517
Calli B, Walsman A, Singh A, Srinivasa S, Abbeel P, Dollar AM (2015) Benchmarking in manipulation research: using the Yale-CMU-Berkeley object and model set. IEEE Robot Autom Mag 22(3):36–52
Article Google Scholar
Liang E, Liaw R, Moritz P, Nishihara R, Fox R, Goldberg K, Gonzalez JE, Jordan MI, Stoica I (2017) RLlib: abstractions for distributed reinforcement learning
Watkins CJCH (1989) Learning from delayed rewards. PhD thesis, King’s College, 1989
Henderson P, Islam R, Bachman P, Pineau J, Precup D, Meger D (2017) Deep reinforcement learning that matters

Download references

Funding

This research was partially supported by project PRODUTECH4S&C (46102) by UE/FEDER through the program COMPETE 2020 and the Portuguese Foundation for Science and Technology (FCT): COBOTIS (PTDC/EME-EME/32595/2017) and UIDB/00285/2020.

Author information

Authors and Affiliations

Department of Mechanical Engineering, Univ Coimbra, CEMMPRE, Coimbra, Portugal
Miguel Neves & Pedro Neto

Authors

Miguel Neves
View author publications
You can also search for this author in PubMed Google Scholar
Pedro Neto
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Miguel Neves implemented the methods and conducted the testing. Pedro Neto defined the initial approach and managed the experimental tests.

Corresponding author

Correspondence to Pedro Neto.

Ethics declarations

Ethics approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Neves, M., Neto, P. Deep reinforcement learning applied to an assembly sequence planning problem with user preferences. Int J Adv Manuf Technol 122, 4235–4245 (2022). https://doi.org/10.1007/s00170-022-09877-8

Download citation

Received: 06 July 2022
Accepted: 28 July 2022
Published: 05 August 2022
Issue Date: October 2022
DOI: https://doi.org/10.1007/s00170-022-09877-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep reinforcement learning applied to an assembly sequence planning problem with user preferences

Abstract

Access this article

Similar content being viewed by others

A comprehensive literature review of the applications of AI techniques through the lifecycle of industrial equipment

A practical guide to multi-objective reinforcement learning and planning

Machine learning and deep learning based predictive quality in manufacturing: a systematic review

Data availability

Code availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval

Consent to participate

Consent for publication

Competing interests

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Deep reinforcement learning applied to an assembly sequence planning problem with user preferences

Abstract

Access this article

Similar content being viewed by others

A comprehensive literature review of the applications of AI techniques through the lifecycle of industrial equipment

A practical guide to multi-objective reinforcement learning and planning

Machine learning and deep learning based predictive quality in manufacturing: a systematic review

Data availability

Code availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval

Consent to participate

Consent for publication

Competing interests

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation