Introduction

People always pursue high-efficiency performance in emergencies while ensuring personal safety [1]. For example, in fire rescue work, putting out fires, smoke-diving and the handling of patients and heavy tools are typical tasks, in which good balance ability can be critical for safety-guaranteed and task productivity [2]. Uncertain fire conditions and the excessive use of protective equipment further increase the challenges placed on the balance control system. Although existing telemanipulation is a highly efficient master–slave work pattern because of human-in-the-loop control, it is not trivial for the human operators to specify the optimal measures to guarantee robotic safety [3,4,5].

The ability to balance safety against performance in humans is coordinated and fast even when a work-related accident is not anticipated. Unfortunately, the robots are always not well trained such knowledge, to take appropriate measures to deal with the potential emergency during telemanipulation tasks. Beyond the basic capabilities of moving and acting autonomously, it is also essential to assure the robots’ survival to protect themselves from harmful states or collisions when physically interacting with their workspace [6,7,8]. It is really true, especially in the search-and-rescue process [9,10,11], robotic dexterous performing task is easy to encounter uncertainties coming from environments in real human–robot collaborative manipulation. Moreover, these conditions to perform a task may need robots to open doors first and then gain access to a new workspace for their end-effectors operations [12,13,14,15]. For example, utilize robots to manipulate the electronic equipments in power stations [16,17,18,19,20]. Typical power station operations are involved a large number of refrigerator-like electric cabinets equipped with electronic monitoring, which need to be checked at close range or switching operated by hands after opening the cabinet door. Similar with many robot–environment interactive processes, leverage robots to further operations in the cabinet’s internal workspace released by door-opening; this procedure is susceptible to environmental uncertainty such as the wind force. The uncertain force could significantly drive the opened door with close trends and then lead to collision damages, threatening the robots’ mechanical safety and work performance. In such cases, the balance ability is essential and vital to interrupt the current manipulation and display the corresponding self-protection at appropriate times.

In this work, we propose a novel balancing robotic safety against manipulation performance approach by planning safety-critical motions and control during work emergencies in the door-closing scenario. Specifically, a dynamic disturbance model of the restricted workspace released by door-opening is established. And then, the workspace and robot interactions are analyzed using a partially observable Markov decision process (POMDP), thereby making the balance mechanism executed as belief tree planning. To perform the planning, besides the telemanipulation actions, we clarify other three types of safety-guaranteed actions: on guard, escape, and defense for self-protection by estimating collision risk levels to trigger them. Finally, we propose three motion controllers based on risk-time optimization to act the planned self-protective actions.

The main contributions of this paper are summarized as follows:

  1. 1.

    To our knowledge, this paper yields the first evaluative framework to balance robotic safety against its operation performance during dynamic interactions in a door-closing workspace, within the collision risk consideration coming from environmental uncertainty;

  2. 2.

    Apart from the manipulation actions, this paper clarifies other three safety-guaranteed actions: on guard, elbow defense the door, and escape out respectively to the collision risk with low, middle or high levels to act the balance policy, which is verified real true based on the experiments with our build-up robot platform;

  3. 3.

    Additionally, this paper is to provide guidance for the safe manipulation and deal with emergencies of a class of rescue robot operations and the upgrade of motion planning.

The rest of this paper is organized as follows. Related works are described in “Related work”, “Workspace construction and problem formulation” explains the workspace construction and problem formulation. A novel balancing safety against performance method is proposed in “Proposed method”. “Experiments and results” validates the efficiency of the proposed method by experiments. Finally, conclusions are drawn in “Conclusions”.

Related works

Related works about emergency measures, balance mechanism and workspace construction are introduced briefly in this section.

The studies on robotic emergency measures for self-protection are both control and planning interesting. From the perspective of the reflex-based control, self-protective behaviors are categorized as the state-action association of behaviors, which traditionally depends on the subsumption architecture [21]. In this paradigm, the robot can quickly react to the stimulus since the sensory input from the dynamic environment directly triggers the coupled action from a wide variety of measures. Given this, many studies have focused on time-delay compensation [22, 23] or reflex-based self-protective patterns and successfully applied to some humanoid robots, i.e., grasp reflex [24] and mainly slip [25]. For them, facing accidental collision risks, generating and maintaining stable controllers are their preventive measures. From the perspective of motion planning in some constrained environments [26], the self-protective response is to avoid one or more dynamic obstacles with uncertain motion patterns [27]. For them, it is necessary to plan smooth and collision-free orbits or trajectories to perform the desired task [28]. In the sense that, the self-protection to guarantee safety in the situation is based on around the dynamic obstacles to avoid collision paths.

Robotic balance mechanism is the prerequisite optimization policy-decision process for taking emergency measures, knowing when to pursue high-efficiency performance, or prefer a security guarantee. Traditionally, owing to the exclusive pursuit of the best performance value, this mechanism is not so flexible even redundant that it could be ignored in both subjective and objective aspects. Similarly, the exclusive pursuit in another extreme case is absolute security.

The aforementioned control and planning technologies are applied to a typical workspace released by door-opening [29, 30], which has also received abundant attention during the last decades. When door-opening actions work in practice [31], the opened door matches external disturbances, such as uncontrolled rotational inertia to get closing trends, are unavoidably encountered. Unless dealt with in a proper way, they would deteriorate the performances of the following operations and even give rise to inconsistent task results, which leads to mission failure. In some cases, the researchers would suitably treat the unlocked door driven by external disturbances depending on the further task’s difficulty. For easy tasks such as opening the door only to traverse it [32, 33], it is no need to care too much about the unlocked door’s state information due to a quick pass through after door-opening. However, complex tasks such as opening the door to get handwork inside [34, 35], are generally more time-consuming and need more operating precision. We can not ignore the uncertain disturbances [36] coming from the unlocked door leading to a potential risk of collision damages. Compared with the above-mentioned simple task, cabinet handwork inside limits the end-effector’s workspace and keeps the robot in the unlocked door’s adverse influence range for a long time. To solve this problem, professional roboticists initially took a dual-arm mobile manipulator scheme [37, 38]. More precisely, using one arm to defense the unlocked door’s closing trends disturbances while planning another arm to work inside. They applied this theoretical pattern to an expensive PR2 (Personal Robot 2) to fetch a beer from a refrigerator [39]. Based on this pattern, the scheme mentioned above even could be used in multi-arm robot systems; unfortunately, it is not friendly for robots with only one arm.

In this work, we focus on a single-arm mobile manipulator robot in human–robot collaborative manipulation to respond to emergencies. The unlocked door has closing disturbances during handwork after door-opening.

Workspace construction and problem formulation

Consider a time-varying workspace \({{\mathcal {W}}}\left( {{t}} \right) \) released from its door-opening action, is constrained by the door’s frame \({{\mathcal {D}}}_\mathrm{{frame}}\) and its leaf \(\mathcal{D}_\mathrm{{leaf}}\). In the top view to see \({{\mathcal {W}}}\left( {{t}} \right) \), Fig. 1 shows the dynamic interactive progress, which seems like a shrinking Chinese folding fan when \({{{\mathcal {D}}}_\mathrm{{leaf}}}\) is driven by the force such as a sudden wind \({F_\mathrm{{w}}}\left( t \right) \).

Fig. 1
figure 1

The dynamic model of the restricted workspace

Simultaneously, due to the resisting force \({F_\mathrm{{r}}}\left( t \right) \) coming from rotation friction and air resistance, \(\mathcal {D}_\mathrm{{leaf}}\) would stop close at a certain position \(p_n\). After these, the state equation for \({{\mathcal {W}}}\left( {{t}} \right) \) can be written as:

$$\begin{aligned} \omega \left( t \right) = f\left[ {\theta \left( t \right) ,{F_\mathrm{{w}}}\left( t \right) - {F_\mathrm{{r}}}\left( t \right) ,t} \right] , \end{aligned}$$
(1)

where \({\theta \left( t \right) }\) denotes the angle between \(\mathcal {D}_\mathrm{{frame}}\) and \({D_\mathrm{{leaf}}}\), \({\omega \left( t \right) }\) denotes the angular velocity and \(f\left( \cdot \right) \) denotes a time-variation function.

In this paper, note that we do not concern about \({{\mathcal {W}}}\left( {{t}} \right) \) having the enlarged dynamic space situation. Thus, standing in the fan-shaped area, the robot is always facing the potential collision risk. We assume that the robotic chassis is necessarily treated as a collision-free part due to equipped with some indispensable precision sensors. After these, the goal is to plan a policy \(\pi \) in \({{\mathcal {W}}}\left( {{t}} \right) \) to get the maximum value function \({V\left( \pi \right) }\) and then control to execute \(\pi \) between the start configuration \({{{\varvec{q}}}_{\varvec{0}}}\in {\mathbb {R}}^{D}\) and the goal configuration \({{{\varvec{q}}}_{\varvec{d}}}\in {\mathbb {R}}^{D}\), which can be written as:

$$\begin{aligned} \max \left\{ {\left. {\left. {V\left( \pi \right) } \right| \pi :{{{{\varvec{q}}}_{\varvec{0}}}} \rightarrow {{{{\varvec{q}}}_{\varvec{d}}}} \in {\mathbb {R}}^{D}} \right\} } \right. \begin{array}{*{20}{c}} \mathrm{{s.t.}}&{{{\mathcal {W}}}\left( {{t}} \right) }, \end{array} \end{aligned}$$
(2)

where \({{\varvec{q}}}\) is the robotic degree of freedom (DOF) and D is the number of the DOF.

Proposed method

In this section, Fig. 2 shows the proposed balancing safety against performance method, which is mainly completed by three aspects, i.e., balance mechanism, interaction estimators and responding measures, that will be presented in detail.

Fig. 2
figure 2

Pipeline in terms of the balancing safety against performance approach

Balance mechanism

The balance mechanism is a collaborative control based on the risk estimators, choosing the manual or automated policy decisions to deal with the workspace.

The upper part of Fig. 2 shows the human–robot interaction for master–slave manipulation tasks after door-opening. The control system to generate action sequences involves an autonomous controller’s network interaction with the human operators. Under the received manual policy and action commands, assuming no significant delays or communication issues occur between the master and the slave, the robot platform could perform dexterous manipulation in efficiency-critical applications such as turn on a power switch for the human in the control loop.

The lower part of Fig. 2 shows the robot–environment interaction for door-closing emergencies. In the policy decisions block, a partially observable Markov decision process (POMDP) [26] architecture simulates the interaction relationship between agents decisions and their environment, which models our robot acting in the partially observable stochastic compressed workspace. It is defined formally as a 7-tuple \(({{\mathcal {S}}},{{\mathcal {A}}},{{\mathcal {Z}}},T,O,R,{b_0})\), where:

\({{\mathcal {S}}}\): indicates a state set of \({\mathcal {D}_\mathrm{{leaf}}}\) at the current time;

\({{\mathcal {A}}}\): indicates an action set that the robot will perform at the next moment;

\({{\mathcal {Z}}}\): indicates an observation set of \({\mathcal {D}_\mathrm{{leaf}}}\) at the current time;

T: the function \(T(s,a,s') = p(s'|s,a)\) indicates the probabilistic state transition from \(s \in {{\mathcal {S}}}\) to \(s' \in {{\mathcal {S}}}\), when the robot in state \(s \in {{\mathcal {S}}}\) takes an action \(a \in {{\mathcal {A}}}\). It can model our imperfect states set of \({\mathcal {D}_\mathrm{{leaf}}}\) changes and robot control;

O: the function \(O(s,a,z) = p(z|s,a)\) indicates a set of conditional observation probabilities currently observed, which can capture sensors noise;

R: the function R(sa) defines a real-valued reward for the robot when it takes action \(a \in {{\mathcal {A}}}\) in state \(s \in \mathcal{S}\).

As analyzed previously, the POMDP planning aims to choose a policy \(\pi \) that maximizes its value based on \({{\mathcal {A}}}\) and \({{\mathcal {S}}}\), but \({{\mathcal {S}}}\) is not known exactly due to imperfect observation. Instead, the robot maintains a belief, which is a probability distribution over \({{\mathcal {S}}}\). The robot starts with an initial belief \({b_0}\). At time t, it infers a new belief, according to Bayes’ rule [40], by incorporating information from the action \(a_t\) taken and the observation \(z_t\) received:

$$\begin{aligned} \begin{aligned} b_{t}\left( s^{\prime }\right)&=\tau \left( b_{t-1}, a_{t}, z_{t}\right) \\&=\eta O\left( s^{\prime }, a_{t}, z_{t}\right) \sum _{s \in {{\mathcal {S}}}} T\left( s, a_{t}, s^{\prime }\right) b_{t-1}(s), \end{aligned} \end{aligned}$$
(3)

where \(\eta \) is a normalizing constant.

Figure 3 shows that a POMDP policy prescribes the action at a belief. With the policy \(\pi \) and an initial belief \(b_0\), the expected value function \(V_\pi \) can be written as:

$$\begin{aligned} {V_\pi }\left( {{b_0}} \right) = E\left( {\sum \limits _{t = 0}^\infty {{\gamma ^t}R\left( {{s_t},{a_{t + 1}}} \right) \left| {{b_0},\pi } \right. } } \right) , \end{aligned}$$
(4)

where \(s_t\) is the state at time t, \(a_{t+1} = \pi (b_{t})\) is the action that the policy \(\pi \) chooses at time t, and \(\gamma \in [0, 1]\) is a discount factor. The expectation V is taken over the sequence of uncertain state transitions and observations over time.

Fig. 3
figure 3

POMDP planning performs a lookahead search on the belief tree

A key idea in POMDP planning is the belief tree [41], as shown in Fig. 3. Each node of a belief tree corresponds to a belief b. At each node, the tree branches on all actions in \({{\mathcal {A}}}\) and all observations in \({{\mathcal {Z}}}\). If a node with belief b has a child node with belief \(b'\), then \(b' = \pi ( b, a, z)\). Conceptually, we may think of POMDP planning as a tree search in the belief space, the space of all possible beliefs that the mobile manipulator may encounter. To find an optimal plan for a POMDP, using Bellman’s equation relationship [42], we traverse the belief tree from the bottom up and compute an optimal action recursively at each node:

$$\begin{aligned} {{V^*}\left( b \right) \buildrel \Delta \over = \mathop {\max }\limits _\pi {V_\pi }(b)}= & {} \mathop {\max }\limits _{a \in A} \left\{ {\sum \limits _{s \in S} {b\left( s \right) R\left( {s,a} \right) } } \right. \nonumber \\&+ \left. {\gamma \sum \limits _{z \in Z} {p\left( {z\left| {b,a} \right. } \right) {V^*}\left( {b'} \right) } } \right\} , \end{aligned}$$
(5)

where we notice that every value function \(V_\pi \) that satisfies Eq. (5) is both necessary and sufficient for the induced policy to be optimal.

Based on the above discussions, in the sense that, our POMDP planning is a special case of belief space planning. In other words, the belief space planning is more general and does not require the planning model to satisfy the mathematical structure of POMDPs. For example, the reward function R may depend on the belief b and not just on \({{\mathcal {S}}}\) and \({{\mathcal {A}}}\). Additionally, at each node, all observations in \({{\mathcal {Z}}}\) are key points for the searching progress, for a reason is the following child node of the belief tree branches on all possible actions in \({{\mathcal {A}}}\).

Interaction estimators

In what follows, the observations and risk estimators block shown in Fig. 2 switch the control priority to trigger the mentioned manual or automated modes in detail.

Figure 4 shows the observation progress for the dynamic \({\mathcal {D}_\mathrm{{leaf}}}\). Let O denote the robot’s sensing position, \(P_i\), \(P_{i+1}\) and Q denote three marked feature points on \({\mathcal {D}_\mathrm{{frame}}}\) and they are coplanar with O. \(\left| {OO'} \right| \) is parallel to \(\left| {{P_{i+1}}{G_{i+1}}} \right| \) and \(\left| {OO'} \right| = \left| {{P_{i+1}}{G_{i+1}}} \right| =\left| {{P_{i}}{G_{i}}} \right| = h\) where h denotes the height between the marked point and the ground. Likewise, \(\left| {{P_i}Q} \right| \) is parallel to d and \(\left| {{P_i}Q} \right| =\left| {{P_{i+1}}Q} \right| =d\) where d denotes the unlocked door leaf’s width.

Fig. 4
figure 4

Observation progress for \({D_\mathrm{{leaf}}}\)

In such case, we can get \(\left| {{P_i}O} \right| \), \(\left| {{P_{i+1}}O} \right| \) and \(\left| {OQ} \right| \) by measurement. According to the geometric relationship, the observed rotation angle \(\Delta {{\hat{\theta }} _i}\) can be written as:

$$\begin{aligned} \Delta {{\hat{\theta }}} = \angle {P_i}Q{P_{i + 1}} = \angle {P_i}QO - \angle {P_{i + 1}}QO, \end{aligned}$$
(6)

where

$$\begin{aligned} \angle {P_i}QO= & {} {\mathrm{arccos}}\frac{{{{\left| {{P_i}Q} \right| }^2} + {{\left| {OQ} \right| }^2} - {{\left| {O{P_i}} \right| }^2}}}{{\mathrm{{2}}\left| {{P_i}Q} \right| \cdot \left| {OQ} \right| }} \\ \angle {P_{i + 1}}QO= & {} {\mathrm{arccos}}\frac{{{{\left| {{P_{i + 1}}Q} \right| }^2} + {{\left| {OQ} \right| }^2} - {{\left| {O{P_i}} \right| }^2}}}{{\mathrm{{2}}\left| {{P_{i + 1}}Q} \right| \cdot \left| {OQ} \right| }}. \end{aligned}$$

For \({\mathcal {D}_\mathrm{{leaf}}}\), the moment of inertia around the door axis is:

$$\begin{aligned} I = \frac{1}{3}m{d^2}, \end{aligned}$$
(7)

where m denotes the \({D_\mathrm{{leaf}}}\) mass. Based on Eqs. (6) and  (7), the observed angular kinetic energy \({{{\hat{E}}}_{\mathcal {D}_\mathrm{{leaf}}}}\) around the door axis can be written as:

$$\begin{aligned} {{{\hat{E}}}_{\mathcal {D}_\mathrm{{leaf}}}} = \frac{1}{2}I{\hat{\omega }}^2, \end{aligned}$$
(8)

where \({{\hat{\omega }}} = {{\Delta {{\hat{\theta }}}} {\Delta t}}\) and \({\Delta t}\) denotes the observation of time unit.

In this paper, \({{{\hat{E}}}_{\mathcal {D}_\mathrm{{leaf}}}}\) indicates the risk estimators block to switch and trigger the above balance mechanism. Combing with Eq. (8), we treat the risk levels coming from \({\mathcal {D}_\mathrm{{leaf}}}\) as inputs, train and divide them into four pre-defined parts (e.g., no risk, low risk, middle risk and high risk), which can be written as:

$$\begin{aligned} {{\mathcal {Z}}}=\left\{ \begin{array}{ll} z_0=\mathrm{{no risk}} &{} {\widehat{E}}_{{{\mathcal {D}}}_{\text{ leaf }}}=0 \\ z_1=\mathrm{{low risk}} &{} 0<{\widehat{E}}_{{{\mathcal {D}}}_{\text{ leaf }}} \le E_{\min } \\ z_2=\mathrm{{middle risk}} &{} E_{\min }<{\widehat{E}}_{{{\mathcal {D}}}_{\text{ leaf }}}<E_{\max } \\ z_3=\mathrm{{high risk}} &{} E_{\max } \le {\widehat{E}}_{{{\mathcal {D}}}_{\text{ leaf }}} \end{array}\right. , \end{aligned}$$
(9)

where \({E_{\min }}\) and \({E_{\max }}\) denote the minimum energy and maximum energy to trigger the child node in belief tree (see Fig. 3). In addition, due to the resisting force \(F_\mathrm{{r}}(t)\) coming from rotation friction and air resistance, \({{{\hat{E}}}_{\mathcal {D}_\mathrm{{leaf}}}}\) could gradually decrease to zero in the door-closing progress.

Responding measures

Last, four types of responding measures shown in Fig. 5, i.e., telemanipulation actions and other three types of emergency actions for self-protection, are presented.

Fig. 5
figure 5

Typical responding actions in interactive workspace

For the robot platform, to deal with emergencies in the limited workspace, the simultaneous multi-action between chassis and arm part leads to complicated movements, even mission failure. To simplify the problem, we assume that action implementation related to the chassis and arm part is mutually exclusive. Based on this, there are four typical classes of actions \(a \in {{\mathcal {A}}}\) in the dynamic workspace:

(10)

where

telemanipulation denotes a task-related action subset, which is well-behaved human-in-the-loop operations to deal with work, as shown in Fig. 5a;

on guard denotes to stop current actions and estimate the collision risk, ready to take the next action according to circumstances, as shown in Fig. 5b;

defense denotes to defense actively the risk of the collision damages using the dexterous arm part. Figure 5d shows the defense part might be the end-effector. Consider that the end-effector has a fragile structure to break and usually expensive, which is not suitable for actual applications. In contrast, using the elbow joint to defense plays a dominant role as active self-protection shown in Fig. 5e;

escape denotes to escape out of the workspace before collision damages, as shown in Fig. 5c.

The rewards for taking \({\varvec{a_i}}\) after \({z_i}\) are pre-trained as the following Table  1. Let good \(=+1\), ok \(=0\), and bad \(=-1\). We treat \(\pi ({\varvec{a_i},z_i}|_{i=0,1,2,3})\) as balance policy between safety and efficiency performance in the dynamic workspace. Among them, \(\pi ({\varvec{a_0}},z_0)\) and \(\pi ({\varvec{a_3}},z_3)\) are traditional research area to improve performance or stress reaction, which are the subset of our proposed balance method.

Table 1 Actions at different risk levels and their rewards function
Fig. 6
figure 6

Schematic of the balance policy and control method

Note that the higher risk level, the less time can be used to act \(\pi ({\varvec{a_i},z_i}|_{i=1,2,3})\), which requires control based on risk time optimization. Let \(t_{z_i}\) denote the collision time in the risk level \(z_i\) without considering safety-critical measures. Obviously, we can get \(t_{z_3}<t_{z_2}<t_{z_1}\), and the action controller \({f_{a_i}}(\cdot )\) can be written as:

$$\begin{aligned} \begin{array}{*{20}{c}} {\mathop {\min }\limits _{0<{t_{a_i}} < {t_{{z_i}}}} \left\{ {\left. {{f_{{a_i}}}({{\varvec{q}}},\dot{{{\varvec{q}}}},{t_{a_i}}):{{{\varvec{q}}}_{\varvec{a_0}}} \rightarrow {{{\varvec{q}}}_{\varvec{a_i}}} \in {\mathbb {R}^D}} \right\} } \right. }&{i = 1,2,3}, \end{array} \end{aligned}$$
(11)

where \({t_{a_i}}\) denotes the time to perform the telemanipulation configuration \({{{\varvec{q}}}_{\varvec{a_0}}}\in {\varvec{a_0}}\) to the desired configuration \({{{\varvec{q}}}_{\varvec{a_i}}}\in {\varvec{a_i}}\). It is switching control progress to self-protection during the telemanipulations. Based on these, the schematic of the proposed balance policy and control method is shown in Fig. 6.

Experiments and results

In this section, we will present our experimental conditions first and then set up four types of experiments to verify the proposed balance method’s efficiency.

Experimental set-up

Figure 7 shows the outlook of the human-in-the-loop robot platform and the dynamic workspace, which is constructed by a standardized power cabinet. The robot platform is mainly composed of a chassis, a 6-DOFs arm, an end-effector and a Kinect, which faces the opened door and runs at 30 frames per second on the chassis. In the dynamic workspace, the specific telemanipulation task is to turn on a switch for electricity supply. Table 2 shows more detailed information and other components.

Fig. 7
figure 7

Human-in-the-loop robot platform and door-closing workspace in a power cabinet box

Table 2 Robot platform and power cabinet box equipment list

Figure 8 shows the robot platform’s geometric relationship during switch work. In the top view, the chassis is partly standing in the fan-shaped area, whose escape is opposite to the end-effector’s working orientation. Without considering safety-guaranteed measures, the collision would be at some position on the chassis after \(t_{z_i}\), which could be acquired by using the fan with no, low, middle or full power for door-closing. After that, the 6-DOFs arm indicates the relationship between each joint’s coordinate system by utilizing red, green, and blue, respectively denote the coordinate axis \(x_i\), \(y_i\), and \(z_i\). The base coordinate system \(x_0\), \(y_0\), and \(z_0\) is attached to the chassis. We acquire the arm’s initial configuration \({{{\varvec{q}}}_{a^{2}_0}}\) as

$$\begin{aligned} {{{\varvec{q}}}_{a^{2}_0}}&=[1.124, -0.947, -1.237, -0.1426, 0.4637, 0.6819]\\&\quad \mathrm{{rad}}, \end{aligned}$$

which comes from the telemanipulation current configuration to reach the switch.

Figure  9 show the Kinect camera’s view and the eye view on hand. Based on the two views, a well-trained human operator could drive our robot platform to activate the policy \(\pi ({\varvec{a_0},z_0})\). In this case, \(\varvec{a_0}\) denote action \((a^{i}_0\in \varvec{a_0}|_{i=1,\cdot \cdot \cdot ,7})\) step as: move-in workspace; reach, clamp, rotary and loosen the switch; take arm back, and move-out workspace (see Fig. 7). In Kinect camera view, the two-dimensional barcodes are detected and measured by point cloud, which are marked positions P and Q to get the distances \(\left| {{P_i}O} \right| \), \(\left| {{P_{i+1}}O} \right| \), and \(\left| {OQ} \right| \) (see Fig. 4). In the following experiments, we only use the Kinect camera as the risk estimator.

Fig. 8
figure 8

The robot platform’s initial conditions and geometric relationship

Fig. 9
figure 9

The views of the Kinect camera and the eye on hand

Results and analysis

Based on the mentioned experimental conditions, \(\pi ({\varvec{a_i},z_i}|_{i=0,1,2,3})\) were implemented on the robot platform against the door-closing, as shown in Fig. 10.

Fig. 10
figure 10

Execution of \(\pi ({\varvec{a_i},z_i}|_{i=0,1,2,3})\) in real door-closing scenario

Figure  10a shows a balance policy sequences to human–robot collaborative experiments with \(\pi ({a^{2}_0|{{{\varvec{q}}}_{a^{2}_0}},z_0})\), \(\pi ({{{\mathrm{{on}\mathrm {-}guard}}\in \varvec{a_1}},z_1})\) and \(\pi ({{{\mathrm{{elbow}\mathrm {-}defense}}\in \varvec{a_2}},z_2})\). The responding results are shown in Fig. 11. In the left column of Fig. 11, |OQ| is a constant because the chassis is stationary in the workspace; |OP| and \(\angle P_iQO\) are gradually decreasing with wind force \(F_\mathrm{{w}} (t)\) after time \(t_2\), and interrupt the change when the defense collision happens. We use the local maximums in \({{{\hat{E}}}_{\mathcal {D}_\mathrm{{leaf}}}}\) to judge \(z_i\) changes. The judgment is true when \(z_i\) is changed for the first time from \(z_0\) to another higher level. Let \({E_{\min }}=0.2 J\) and \({E_{\max }}=0.4 J\), we get the time \(t_2\) (\({E_{\min }}<{{{\hat{E}}}_{\mathcal {D}_\mathrm{{leaf}}}}<{E_{\max }}\)) to trigger \(\pi ({{{\mathrm{{elbow}\mathrm {-}defense}}\in \varvec{a_2}},z_2})\). During the time \(({t_2}-{t_1})\), robot platform hold on the current configuration \({{{\varvec{q}}}_{a^{2}_0}}\) to do next action \({{a^{3}_0}}\), with estimating the \({{{\hat{E}}}_{\mathcal {D}_\mathrm{{leaf}}}}\) to ensure no more than \({E_{\min }}=0.2 J\). In other words, the robot performed vigilant self-protective awareness compared with the artificial stop or pause in telemanipulation.

Risk getting higher after \(t_2\), the robot platform would get the damages more than 1.5 J, which is avoided by elbow-defense. Responding to \(z_i\), the switch operations that require precise and small-scale motion are assigned to the end-effector, while the large-scale action for self-protection is fast and carried out by the chassis or arm. We let the end-effector’s orientation remain to face the switch and keep end-effector horizontal movement (see Fig. 9) to pre-trained defense configuration, hoping to continue current work quickly after the \(\pi ({{{\mathrm{{elbow}\mathrm {-}defense}}},z_2})\). Based on this, the control based on risk-time optimization is treated as a linear move control in the end-effector’s workspace with full speed. In the right column of Fig. 11, all the arm joints are related to the configuration’s execution and have significant changes. Their angular velocities, \(\omega _{q1}\) and \(\omega _{q5}\), to get the full speed in a short time with their physical constraints. The acquired finial defense configuration \({{{\varvec{q}}}_{a_2}}\) is

$$\begin{aligned} {{{\varvec{q}}}_{a_2}}\!=\![\!-\!0.96, \!-\!0.9715, \!-\!1.212, \!-\!0.1234, 2.5467, 0.914] \mathrm{{rad}} \end{aligned}$$
Fig. 11
figure 11

The collaborative experiments of the telemanipulation, on-guard and elbow-defense

Fig. 12
figure 12

The collaborative experiments of the telemanipulation, on-guard and line-escape

Figure 10b shows other balance policy sequences to human–robot collaborative experiments with \(\pi ({a^{2}_0|{{{\varvec{q}}}_{a^{2}_0}},z_0})\), \(\pi ({{{\mathrm{{on}}}{\mathrm{{-}guard}}\in \varvec{a_1}},z_1})\) and \(\pi ({{\mathrm{{line}\mathrm {-}escape}\in \varvec{a_3}},z_3})\). The responding results are shown in Fig. 12. In Fig. 12, bring into correspondence with three policies in Fig. 11, \(t_1\) and \(t_2\) are also the states (\(z_0\),\(z_1\),\(z_3\)) change time and the responding action’s start time. |OP| and \(\angle P_iQO\) both get the horizontal curves after \(t_2\). What the difference is, the horizontal curves in the elbow-defense case, the defense collision stopped the door close. But in the line-escape case, the reason is that observation is in the camera’s blind vision when escape out of the workspace. The local maximum at \(t_2\) indicates the robot is at high risk, which triggers the chassis to escape out straight with chassis’ max speed. Additionally, Compared with the line-escape action, the on-guard and elbow defense’s advantage is having a predictable performance to quick callback the interrupted work after the risk was relieved, without the time-consuming cost of re-planning or re-doing move-in the workspace.

Conclusions

In this paper, a balancing safety against performance approach for door-closing emergencies in human–robot collaborative manipulation has been proposed. Specifically, We first established a dynamic disturbance model of the restricted workspace released by door-opening. And then, the workspace and robot interactions are analyzed using a partially observable Markov decision process (POMDP), thereby making the balance mechanism executed as belief tree planning. Responding to the policy, besides the telemanipulation actions, we clarify other three safety-guaranteed actions: on guard, escape, and defense for self-protection by estimating collision risk levels to trigger them. Finally, we propose a motion controller based on risk time optimization to act the planned self-protective actions. Our build-up robot platform and a power cabinet inner dynamic constrained workspace were setup to verify the validity and efficiency of the proposed planning and control. This paper is to provide guidance for the safe manipulation and deal with emergencies of a class of robot operations and the upgrade of motion planning.