Intelligent Robotic Arm Path Planning (IRAP2) Framework to Improve Work Safety in Human-Robot Collaboration (HRC) Workspace Using Deep Deterministic Policy Gradient (DDPG) Algorithm

Wu, Xiangqian; Yi, Li; Klar, Matthias; Hussong, Marco; Glatt, Moritz; Aurich, Jan C.

doi:10.1007/978-3-031-18326-3_18

Xiangqian Wu¹²,
Li Yi¹²,
Matthias Klar¹²,
Marco Hussong¹²,
Moritz Glatt¹² &
…
Jan C. Aurich¹²

Part of the book series: Lecture Notes in Mechanical Engineering ((LNME))

Included in the following conference series:

International Conference on Flexible Automation and Intelligent Manufacturing

6544 Accesses
1 Citations

Abstract

Industrial robots are widely used in manufacturing systems. The places that humans share with robots are called human-robot collaboration (HRC) workspaces. To ensure the safety in HRC workspaces, a collision-avoidance system is required. In this paper, we regard the collision-avoidance as a problem during the robot action trajectory design and propose an intelligent robotic arm path planning (IRAP²) framework. The IRAP² framework is based on the deep deterministic policy gradient (DDPG) algorithm because the path planning is a typical continuous control problem in a dynamic environment, and DDPG is well suited for such problems. To test the IRAP² framework, we have studied a HRC workspace in which the robot size is larger than humans. At first, we have applied a physics engine to build a virtual HRC workspace including digital models of a robot and a human. Using this virtual HRC workspace as the environment model, we further trained an agent model using the DDPG algorithm. The trained model can optimize the motion path of the robot to avoid collision with the human.

You have full access to this open access chapter, Download conference paper PDF

Obstacle Avoidance Control Method for Robotic Assembly Process Based on Lagrange PPO

Collision-free path planning for welding manipulator via hybrid algorithm of deep reinforcement learning and inverse kinematics

Article Open access 10 April 2021

An Improved Robot Path Planning Algorithm for a Novel Self-adapting Intelligent Machine Tending Robotic System

Keywords

1 Introduction

To handle more complex manufacturing tasks, industrial robots are widely used in manufacturing systems because robots can provide fast and precise executions in repetitive tasks [1]. Nevertheless, robots lack the flexibility and adaptability of humans, and therefore, recent robotics research has focused on human-robot collaboration (HRC), which ensures both precision and flexibility in manufacturing systems [2].

The places that humans share with robots are called HRC workspaces [2]. Whenever robots are working with humans in HRC workspaces, security concerns apply. For example, safety regulations are elaborated by numerous standards (e.g. ISO 10218). In conventional scenarios, robots need to be separated from humans by specific equipment, e.g. protective fences. As HRC workspaces require the coexistence of humans and robots in one place, new safety concerns are of importance, and former separation regulations systems cannot persist in HRC workspaces.

Aiming at the resulting safety problem in HRC workspaces, two major categories of measures are commonly applied [3]. The first category intends to minimize the injury risk when collisions between humans and robots cannot be avoided. Measures in this category include mechanical compliance systems (e.g., viscoelastic covering [4] or mechanical absorption systems [5]), lightweight robot structures (e.g. [6]) and safety strategies involving collision or contact detection respectively (e.g. [7]). The commercial robots applied in HRC workspaces usually comprise one or several of these features [3]. Another category includes measures that achieve an active collision avoidance. These measures incorporate information about the robot motion and the human operations using vision systems or other sensing modules. Based on this information, alternative trajectory paths are generated to avoid the forecasted collision [3]. The works related to the collision-avoidance based on different sensors can be found in [8,9,10,11,12,13].

In addition to sensorics, deep reinforcement learning (RL) is another important approach to realize the collision-free path planning in HRC workspaces. RL is a subclass of machine learning and consists of two main parts, the agent and the environment [14]. The agent receives a representation of the current state within the environment and selects an action based on a policy. Once the action is performed, the agent will receive a reward. The agent aims at learning a policy that maximizes the total discounted future reward [15]. RL has been used successfully in various application fields such as solving complex games [16], job shop scheduling [17], or factory layout planning [18]. In terms of the collision problems in HRC workspaces, the implementation of RL can be found in a number of studies, e.g. [19,20,21].

However, in approaches related to the RL-based collision-avoidance, the size of robots in their HRC workspaces is smaller than humans, and the case where the robot size is larger than human is not considered. When the robots are small, even if the collision cannot be avoided, the location of the collision is mostly in the human hands or arms, which does not lead to a high risk of fatal injuries. But when the size of the robot is larger than humans, the collision may occur in the head or torso, resulting in a higher risk of fatal injuries. Therefore, we are focusing on the case where the robot size is larger than a human and are proposing an intelligent robotic arm path planning (IRAP²) framework. The IRAP² framework and its case are explained in remainder of the paper.

2 Problem Statement and Methodology

2.1 Problem Statement

In our case, we scaled the scenario down to the size of a desktop scenario, as depicted in Fig. 1. In our desktop-level HRC workspace, the height of the base of the robot is 138 mm, and the lengths of the first and second connecting links are 135 mm and 147 mm, respectively. Neglecting the degree of freedom (DoF) of the attached vacuum gripper, the robot has 3 DoF, as labeled from J1 to J3 in Fig. 1. The movement ranges of J1, J2, and J3 are (−135° to +135°), (0° to +85°), and (−10° to +95°), respectively. The maximum rotation speed of the joints is 320°/s. To make the human model compatible for the small robot, the height of the human is downscaled to 129 mm.

To make the training environment as similar as possible to the real physical environment, we have applied the 3D physics engine ‘PyBullet’ to build a 1:1 virtual model for training the RL model in the IRAP² framework. The virtual model consists of three parts: a virtual robot arm, a virtual human, and a virtual pickup object, as depicted in Fig. 1. The problem has been defined as to find out the shortest path to pick up the blue object without colliding with the human. In this work, we consider four different cases: (1) Pick up the target object with no humans (as comparison reference case); (2) Pick up the object with one human standing at one specific position; (3) Pick up the object with one human standing at one of two positions; and (4) Pick up with two humans standing at two specific positions, as depicted in Fig. 1.

2.2 The IRAP² Framework Based on DDPG

In HRC workspace, the motion path of the robot can be regarded as a sequence of decisions and can be planed using Deep Deterministic Policy Gradient (DDPG) algorithm.

Figure 2 illustrates the IRAP² framework based on the DDPG algorithm. In the virtual 3D physics environment, the virtual robot arm is allowed to explore various positions within the described ranges and obtains a reward according to its interaction with the environment. The action, current state, next state, and reward of the virtual robot can be denoted as a list of tuples $\{{a}_{t},{s}_{t},{s}_{t+1},{r}_{t}\}$, which will be stored in the replay buffer and can be used as data for training the artificial deep neural network. For each training iteration, the replay buffer will sample 64 batches of $\{{a}_{t},{s}_{t},{s}_{t+1},{r}_{t}\}$, and a critic network with the weight ${w}_{Q}$ will calculate the state value $Q({s}_{t},{a}_{t},{w}_{Q})$ that determines the cumulated rewards of the state ${s}_{t}$. Furthermore, an actor network with the weight ${w}_{\mu }$ is used to obtain the behavioral policy $\mu ({s}_{t},{w}_{\mu })$, which is the action of the virtual robot for the next time step. For the stability of the training, two target networks are created for the critic and actor network, which are denoted as $Q{^{\prime}}({s{^{\prime}}}_{t},{a{^{\prime}}}_{t},{w{^{\prime}}}_{Q})$ and $\mu {^{\prime}}({s{^{\prime}}}_{t},{w{^{\prime}}}_{\mu })$, respectively. Weights of two target networks are updated slowly for each iteration (Soft Update). The update of weights of current critic and actor network in the DDPG algorithm is performed by minimizing the loss function through RMSProp optimizer. Loss functions for the actor and critic network (L_a and L_c) are expressed as follows.

$$ L_a \left( {w_\mu } \right) = \nabla_\mu Q\left( {s_t ,a_t ,w_Q } \right)\nabla_{w_\mu } \mu \left( {s_t ,w_\mu } \right) $$

(1)

$$ L_c \left( {w_Q } \right) = \left[ {r + \gamma Q^{\prime}\left( {s^{\prime}_t, \mu^{\prime}\left( {s^{\prime}_t ,w^{\prime}_\mu } \right),w^{\prime}_Q } \right) - Q\left( {s_t ,a_t ,w_Q } \right)} \right]^2 $$

(2)

In Eq. (1) and (2), r denotes the reward, $\gamma $ is the discount factor, and $\nabla $ describes the gradient. The action, state, and reward functions of the four cases are summarized in Table 1, where ${\varphi }_{1}, {\varphi }_{2},{\varphi }_{3}$ are the rotation angle of joints J₁, J₂, and J₃, respectively. The parameters $\alpha , \beta ,\varepsilon ,\delta $ are the scaling constants to convert the distance values and reward values from environment model in PyBullet to the values that are suitable for training the DDPG neural networks. The parameters ${d}_{ta},{d}_{{h}_{1}},{d}_{{h}_{2}}$ are the distance of the vacuum gripper to the target object, the first human, and the second human, respectively. Finally, the parameter i is an index to indicate whether the object has been successfully picked up or not. After the training, the optimal path of the agent can be exported to control the real robot in the HRC workspace.

Table 1. State and reward functions

Full size table

3 Results and Discussion

3.1 Evaluation of the Training Process

The first result is the training performance. Figure 3 shows the number of steps (one step implies one action by the agent) and rewards versus the training episodes. During each training episode, if the target object has been picked up, the episode will be closed. Moreover, the maximal steps are set to 300. In Fig. 3, it is seen that in the first case, the agent was not able to catch the target object until about 200 episodes. From 200 episodes to about 300 episodes, the number of steps is reduced to appr. 100, but the optimal path is not yet found. From the reward plot, it is seen that the agent in first case can always reach the optimal grasping path after about 300 episodes. In the second case, the step and reward plots clearly show that the agent can always reach the optimal path after about 200 episodes. In the third and fourth cases, the stable optimal path generation is not achieved until more than 500 episodes.

Comparing all cases, it can be concluded that the IRAP2 framework has a higher training efficiency when there is only one human standing at a fixed position (i.e. in the second case). The third and fourth cases have more possible positions and humans, and therefore, the agent needs more episodes. Moreover, the minimum number of steps required to pick up the object is about 30 steps, and the maximum reward is about 150.

3.2 The Optimal Pick-Up Path Generated by IRAP² Framework

The optimal pickup paths for the four cases are shown in Fig. 4, where a purple or light-blue line frame in the 3D-plots of diagonal view describes the links between the robot’s joints, and each frame represents one state of the agent, as outlined in the first case in Fig. 4. The 2D-line plots under the 3D-plots are the grasping path the gripper from the top view. In viewing the 3D plots, it is seen that in the first case, the robot’s joint J₁ rotates directly counterclockwise around the z-axis, the robot’s two links descend almost in a straight line, and the gripper picks up the target object directly. In the second case, there is a process of keeping the robot’s gripper highly parallel when it is almost close to the human (as highlighted by the green box), which implies the robot’s decision to avoid the collision with the human. In the third case, the purple lines represent the robot’s path when the human appears at position P1 (yellow human), while the light blue lines represent the path when human appears at position P2 (red human). Since the red human is more outward (in the positive direction of the x-axis), the path of the light blue lines is located a little more outward compared to the path of the purple lines. This is because in the fourth case, two humans are standing in the HRC workspace at the same time, and the robot tries to pick up the target object waving its arm from the outside (in the positive direction of the x-axis) around the two humans, in order to avoid a collision. In viewing the 2D-plots, it is seen that the gripper has successfully avoided the collision with the humans.

3.3 Validation of the Robot Control

Finally, our optimal paths are successfully applied in four different scenarios. Figure 5 shows the control process in the second case as an example.

In this case, it can be seen that from 0 to 2 s, the robot arm lifts upward to avoid collision with the human. From 2 to 10 s, the robot arm moves around the human and picks up the target object. Without our algorithm, the robot would move in a straight line directly along the direction of the target object and crash into the human model. With this validation, we successfully confirmed the feasibility of the IRAP² framework.

4 Conclusion and Outlook

In this work, we have confirmed the feasibility of the IRAP² framework as well as the DDPG algorithms to generate the optimal path of robot in HRC workspaces. Moreover, in a desktop HRC workspace scenario, we studied that the case that the size of a robot is larger than humans and considered different working conditions. In terms of future work, it is suggested to upscale the implementation scenario of the IRAP² framework to a real industrial HRC workspace. Moreover, in this work, all humans are assumed to be standing at one position or moving between two certain positions. In the future work, a more complex moving behavior of the humans must be considered in the problem. In addition, it is to mention that big robots are relatively more dangerous than small robots, and the implementation of such HRC requires a higher level of safety measures. Furthermore, the cases in which robot and human sizes are similar should be considered in the future work because such a problem may be more complex, since the robot and human have similar velocities and workspaces dimensions, and the robot needs more accurate and fast response capabilities. Finally, another future work should be focused on the improvement of the computing efficiency of the algorithm as well as the comparison of our approach with other existing optimization approaches such as genetic algorithms or other RL approaches such as normalized advantage function.

References

Wang, L., et al.: Symbiotic human-robot collaborative assembly. CIRP Ann. 68, 701–726 (2019)
Article Google Scholar
Wang, L., Liu, S., Liu, H., Wang, X.V.: Overview of human-robot collaboration in manufacturing. In: Wang, L., Majstorovic, V.D., Mourtzis, D., Carpanzano, E., Moroni, G., Galantucci, L.M. (eds.) Proceedings of 5th International Conference on the Industry 4.0 Model for Advanced Manufacturing. LNME, pp. 15–58. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-46212-3_2
Chapter Google Scholar
Robla-Gomez, S., Becerra, V.M., Llata, J.R., Gonzalez-Sarabia, E., Torre-Ferrero, C., Perez-Oria, J.: Working together: a review on safe human-robot collaboration in industrial environments. IEEE Access 5, 26754–26773 (2017)
Article Google Scholar
Yamada, Y., Morizono, M., Umetani, U., Takahashi, T.: Highly soft viscoelastic robot skin with a contact object-location-sensing capability. IEEE Trans. Ind. Electron. 52, 960–968 (2005)
Article Google Scholar
Zinn, M., Khatib, O., Roth, B.: A new actuation approach for human friendly robot design. In: Proceedings of the IEEE International Conference on Robotics and Automation, ICRA 2004, vol. 1, pp. 249–254. IEEE (2004)
Google Scholar
Hirzinger, G., et al.: DLR’s torque-controlled light weight robot III-are we reaching the technological limits now? In: Proceedings of the 2002 IEEE International Conference on Robotics and Automation (Cat. No. 02CH37292), pp. 1710–1716. IEEE (2002)
Google Scholar
Yamada, Y., Hirasawa, Y., Huang, S., Umetani, Y., Suita, K.: Human-robot contact in the safeguarding space. IEEE/ASME Trans. Mechatron. 2, 230–236 (1997)
Article Google Scholar
Tan, J.T.C., Duan, F., Zhang, Y., Watanabe, K., Kato, R., Arai, T.: Human-robot collaboration in cellular manufacturing: design and development. In: 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 29–34. IEEE (2009)
Google Scholar
Corrales, J.A., Gómez, G.J.G., Torres, F., Perdereau, V.: Cooperative tasks between humans and robots in industrial environments. Int. J. Adv. Robotic Syst. 9, 94 (2012)
Article Google Scholar
Ceriani, N.M, Buizza Avanzini, G., Zanchettin, A.M, Bascetta, L., Rocco, P.: Optimal placement of spots in distributed proximity sensors for safe human-robot interaction. In: 2013 IEEE International Conference on Robotics and Automation, pp. 5858–5863 (2013)
Google Scholar
Bascetta, L., et al.: Towards safe human-robot interaction in robotic cells: An approach based on visual tracking and intention estimation. In: 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 2971–2978. IEEE (2011)
Google Scholar
Schiavi, R., Bicchi, A., Flacco, F.: Integration of active and passive compliance control for safe human-robot coexistence. In: 2009 IEEE International Conference on Robotics and Automation, pp. 259–264. IEEE (2009)
Google Scholar
Flacco, F., Kroger, T., Luca, A. de, Khatib, O.: A depth space approach to human-robot collision avoidance. In: 2012 IEEE International Conference on Robotics and Automation, pp. 338–345. IEEE (2012)
Google Scholar
Joshi, A.V.: Machine Learning and Artificial Intelligence. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-26622-6
Sutton, R.S., Barto, A.: Reinforcement Learning: An Introduction, 2nd edn. Adaptive Computation and Machine Learning. The MIT Press, Cambridge (2018)
Google Scholar
Silver, D., et al.: A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science 362, 1140–1144 (2018)
Article MathSciNet MATH Google Scholar
Aydin, M., Öztemel, E.: Dynamic job-shop scheduling using reinforcement learning agents. Robot. Auton. Syst. 33, 169–178 (2000)
Article Google Scholar
Klar, M., Glatt, M., Aurich, J.C.: An implementation of a reinforcement learning based algorithm for factory layout planning. Manuf. Lett. 30, 1–4 (2021)
Google Scholar
El-Shamouty, M., Wu, X., Yang, S., Albus, M., Huber, M.F: Towards safe human-robot collaboration using deep reinforcement learning. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), Piscataway, NJ, pp. 4899–4905. IEEE (2020)
Google Scholar
Prakash, B., Khatwani, M., Waytowich, N., Mohsenin, T.: Improving safety in reinforcement learning using model-based architectures and human intervention (2019)
Google Scholar
Liu, Q., Liu, Z., Xiong, B., Xu, W., Liu, Y.: Deep reinforcement learning-based safe interaction for industrial human-robot collaboration using intrinsic reward function. Adv. Eng. Inform. 49, 101360 (2021)
Article Google Scholar

Download references

Author information

Authors and Affiliations

TU Kaiserslautern, P.O. Box, 3049, 67653, Kaiserslautern, Germany
Xiangqian Wu, Li Yi, Matthias Klar, Marco Hussong, Moritz Glatt & Jan C. Aurich

Authors

Xiangqian Wu
View author publications
You can also search for this author in PubMed Google Scholar
Li Yi
View author publications
You can also search for this author in PubMed Google Scholar
Matthias Klar
View author publications
You can also search for this author in PubMed Google Scholar
Marco Hussong
View author publications
You can also search for this author in PubMed Google Scholar
Moritz Glatt
View author publications
You can also search for this author in PubMed Google Scholar
Jan C. Aurich
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Li Yi .

Editor information

Editors and Affiliations

Industrial and Systems Engineering, Wayne State University, Detroit, MI, USA
Kyoung-Yun Kim
Industrial and Systems Engineering, Wayne State University, Detroit, MI, USA
Leslie Monplaisir
Industrial and Systems Engineering, Wayne State University, Detroit, MI, USA
Jeremy Rickli

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, X., Yi, L., Klar, M., Hussong, M., Glatt, M., Aurich, J.C. (2023). Intelligent Robotic Arm Path Planning (IRAP²) Framework to Improve Work Safety in Human-Robot Collaboration (HRC) Workspace Using Deep Deterministic Policy Gradient (DDPG) Algorithm. In: Kim, KY., Monplaisir, L., Rickli, J. (eds) Flexible Automation and Intelligent Manufacturing: The Human-Data-Technology Nexus . FAIM 2022. Lecture Notes in Mechanical Engineering. Springer, Cham. https://doi.org/10.1007/978-3-031-18326-3_18

Download citation

DOI: https://doi.org/10.1007/978-3-031-18326-3_18
Published: 13 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-18325-6
Online ISBN: 978-3-031-18326-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics