1 Introduction

In recent years, the fields of robotic application have increased rapidly. Different from traditional working environment, such as the assembly line, robots are operating in more complex and dynamic environments for new tasks, such as collaborating with humans. As a prerequisite of collision avoidance and motion planning, obstacle detection is playing an important role to guarantee the safety of human co-workers, the surrounding facilities and the robot itself, and also to complete the tasks more intelligently.

Various types of sensors have been used in previous studies to achieve environmental perception. Most of the sensors provide a 3D point cloud of the manipulator and the surrounding environment. To understand the surrounding environment, the point cloud processing method can be quite different with different sensor arrangements, i.e., eye-in-hand arrangement and eye-to-hand arrangement. For the eye-in-hand arrangement, the sensors are mounted on the end-effector of the robot. In [12], Natarajan uses a 3D Time-Of-Flight (TOF) camera as a proximity sensor to identify the position of the obstacle near the elbow of the manipulator. In [11], Danial introduces a manipulator robot surface following algorithm using a 3D model of the vehicle body panels acquired by a network of rapid but low resolution RGB-D sensors. The advantages of eye-in-hand arrangement are providing higher resolution point cloud of the obstacle, and that the algorithm is relatively simple because the coordinate transformation is not needed to measure the distance between the robot and the obstacle. However, the limited viewing angle and inevitable occlusion of the vision will bring a lot of blind spots [2, 13, 14, 24]. For the eye-to-hand arrangement, the sensor is arranged outside the working space and senses the whole working space in all the time [15, 19]. In [14], Pan presents a new collision- and distance- query algorithm, which can efficiently handle large amounts of point cloud sensor data received at real-time rates. Sukmanee presents a method to distinguish between a manipulator and its surroundings using a depth sensor in [20], where the iterative least square (ILS) and iterative closest point (ICP) algorithms are used for coordinate calibration and the matching between the manipulators model and point cloud. The eye-to-hand arrangement can provide a large enough viewing angle, but an unavoidable problem is to segment the robot itself from the environment during the point cloud processing.

Self-identification is defined as a process to identify the robot itself in the 3D point cloud. In previous works, the general solutions of self-identification are based on the pre-built 3D models of the robots or simple neighbourhoods of the robot skeletons with predefined radiuses. The manipulators are simply detected and deleted from the point clouds based on 3D models or the neighbourhood regions [7, 10, 15, 20]. However, an accurate 3D model is not always available for a given robot, and the installation of accessories, such as additional grippers, may greatly reduce the accuracy and reusability of the 3D model. In addition, the inevitable calibration error and the simplified 3D model of the manipulator may also result in incomplete deletion. Some of the points on the robot may not be identified as the robot, but as surrounding environment. These remaining points near the manipulator will cause great trouble to the subsequent motion planning or obstacle avoidance control. On the other hand, when using the simple neighbourhood of the robot skeleton with a predefined radius [19], the radius has to be large enough to guarantee that every single part of the robot, including the convex portions, are covered by the neighbourhood. Because of the inappropriately large radius, some obstacles near the robot may also be covered by the neighbourhood and be detected as part of the robot itself. In this case, the robot has to avoid the obstacle even if the obstacle is far away from the robot to prevent the obstacle from dropping into the blind area, which may reduce the flexibility of the robot.

A better way to segment the robot from the environment is to generate a 3D model automatically based on the point cloud. Several model generation approaches can be found in previous studies. In [3], a Curvature-based approach is presented to achieve multi-scale feature extraction from 3D meshes and unstructured point clouds. In [22], a novel shape-sensitive point sampling approach is presented to reconstruct a low-resolution point-cloud model, which can modify sampling regions adaptively. These methods can generate 3D models for a wide variety of objects, but are designed only for off-line applications. And the processing time of these methods is up to tens of seconds, which makes them unable to work in real time. To speed up the processing time, priori information and assumptions of the object or environment to be modelled are always helpful. In [23], a fast, automated, scalable generation method is proposed to generate the textured 3D models of indoor environments, where the regular shape of the wall and the ceiling have been taken into consideration. Similarly, the Manhattan world assumption is widely considered for the generation of the building structures [5, 18, 27].

In this paper, the kinematic model of the robot is used as the priori information. A self-identification method is proposed to automatically generate a simplified 3D model for the robot based on the point cloud in real time. The robot skeleton is firstly created by the forward kinematic model of the robot and matched with the point cloud by calibration. Thereafter, the point cloud is over-segmented into superpixels using continuous k-means clustering method and then segmented into several meaningful categories. Next, a simplified robot 3D model is generated based on the skeleton of the robot, and updated using the segmented points. Finally, the collision prediction method is designed based on the robot model and the obstacle points to achieve the real time collision avoidance feature.

Our proposed method distinguishes with most existing methods and the novelties are summarized as below:

  1. (i)

    An over-segmentation method in 3D space is designed based on continuous k-means clustering method to over-segment the 3D point cloud into superpixels.

  2. (ii)

    A segmentation method is proposed based on the forward kinematics of the robot to segment the superpixels into several meaningful categories.

  3. (iii)

    A simplified 3D robot model is generated automatically and updated on-line based on the robot skeleton and the segmented superpixels.

  4. (iv)

    A collision prediction method is proposed based on the simplified 3D robot model and the obstacle points.

The remainder of this paper is organized as follows: Section 2 gives some preliminaries used in this paper. Section 3 shows the collision predication method, including self-identification and obstacle detection. Section 4 briefly introduces the control strategy of the robot to achieve the obstacle avoidance; Section 5 presents several experiments compared with the previous method to evaluate the proposed obstacle detection method; finally, we conclude this paper in Section 6.

2 Preliminaries

2.1 3D point cloud

As a low-cost typical RGB-D sensor, the Microsoft Kinect sensor can generate a detailed point cloud with around 300,000 points at 30 Hz, and is widely used for the point cloud capture [1, 6, 11, 1416, 20, 21, 25]. The raw data obtained from the Kinect sensor includes an RGB image and a depth image, as shown in Fig. 1. A 3D point cloud in Cartesian coordinate is generated from these two images. Each pixel in the depth image corresponds to a point in the 3D point cloud. Denote the set of points in the 3D point cloud as \(\boldsymbol {P} = \left \{\boldsymbol {p_{i}}\left (x_{i}, y_{i}, z_{i}\right )|i = 0,1,\ldots ,n\right \}\), where n is the total number of pixels in the depth image. The coordinate of the i th point p i is calculated by (1) [21].

Fig. 1
figure 1

The raw data obtained from the Kinect sensor

$$ \begin{array}{lll} x_{i} &= d_{i}({x^{c}_{i}} - c_{x})/f_{x}\\ y_{i} &= d_{i}({y^{c}_{i}} - c_{y})/f_{y}\\ z_{i} &= d_{i}, \end{array} $$
(1)

where d i is the depth value of the i th pixel in the depth image; \(({x^{c}_{i}},{y^{c}_{i}})\) is the coordinate of the i th pixel in the image coordinate system; c x , c y , f x and f y are the elements of intrinsic parameters of the depth camera.

The colors of the points are obtained from the corresponding pixels in the RGB image, which have the same coordinates as in the depth image. The generated point cloud is shown in Fig. 2.

Fig. 2
figure 2

The 3D point cloud

2.2 Manipulator kinematic model

The forward kinematics solution is used to acquire the skeleton of the manipulator. The self-identification method that will be developed in Section 3.3 is designed based on the forward kinematics and the skeleton. In the skeleton, each link of the robot can be seen as a segment in 3-D space. The coordinates of the endpoints of the segments, i.e., the coordinates of the joints, in Cartesian space can be obtained based on the kinematic model of the robot, which can be found in our previous work in [4] for the Baxter robot, as:

$$ \boldsymbol{X}_{\boldsymbol{i}}\boldsymbol{\prime} = {~}^{\boldsymbol{i}\boldsymbol{-1}}\!\boldsymbol{A}_{\boldsymbol{i}}~\cdots~{~}^{\boldsymbol{1}}\!\boldsymbol{A}_{\boldsymbol{2}} {~}^{0}\!\boldsymbol{A}_{\boldsymbol{1}} \boldsymbol{X}_{\boldsymbol{0}}\boldsymbol{\prime}, i = 1,2,\ldots,n, $$
(2)

where \(\boldsymbol {X}_{\boldsymbol {i}}\boldsymbol {\prime } = [x'_{i},y'_{i},z'_{i},1]^{T}\) is an augmented vector of the relative Cartesian coordinate of the i th joint in the robot coordinate system; \(\boldsymbol {X}_{0}\boldsymbol {\prime }\) is the augmented coordinate of the base of the manipulator in the robot coordinate system; and i−1 A i is the link homogeneous transformation matrix, which can also be found in [4].

2.3 Calibration

The robot skeleton and the point cloud need to be placed into a unified coordinate system, in order to achieve self-identification of the manipulator. The coordinate transformation of the robot skeleton from the robot coordinates to the Kinect coordinates can be obtained using homogeneous transformation as shown in (3).

$$ \boldsymbol{X}_{\boldsymbol{i}} = \boldsymbol{T}\boldsymbol{X}_{\boldsymbol{i}}\boldsymbol{\prime}, $$
(3)

where T is the transformation matrix, which will be defined below, X i =[x i y i z i 1]T is the coordinate in the robot coordinate system, and \(\boldsymbol {X}_{\boldsymbol {i}}\boldsymbol {\prime } = [x_{i}^{\prime } ~y_{i}^{\prime } ~z_{i}^{\prime } ~1]^{T}\) is the coordinate in the Kinect coordinate system.

The transformation matrix T can be obtained by measuring the coordinates of four non-collinear points under the robot coordinates and the Kinect coordinates. Assuming that we have four non-collinear points s 1 , s 2 , s 3 and s 4 , the coordinates of the points in both the robot and the Kinect coordinate system, XYZ and \(X^{\prime }-Y^{\prime }-Z^{\prime }\), are (x 1,y 1,z 1), (x 2,y 2,z 2), (x 3,y 3,z 3), (x 4,y 4,z 4), and \((x_{1}^{\prime },y_{1}^{\prime },z_{1}^{\prime })\), \((x_{2}^{\prime },y_{2}^{\prime },z_{2}^{\prime })\), \((x_{3}^{\prime },y_{3}^{\prime },z_{3}^{\prime })\), \((x_{4}^{\prime },y_{4}^{\prime },z_{4}^{\prime })\) respectively. The transfer matrix T from the Kinect coordinates to the robot coordinates can be calculated by (4). The point cloud in the Kinect coordinates and the transformed robot skeleton are shown in Fig. 3.

Fig. 3
figure 3

The calibration results

$$ \boldsymbol{T} = \left[ \begin{array}{cccc} x_{1} & x_{2} & x_{3} & x_{4} \\ y_{1} & y_{2} & y_{3} & y_{4} \\ z_{1} & z_{2} & z_{3} & z_{4} \\ 1 & 1 & 1 & 1 \end{array} \right] \left[ \begin{array}{cccc} x_{1}^{\prime} & x_{2} & x_{3}^{\prime} & x_{4}^{\prime} \\ y_{1}^{\prime} & y_{2}^{\prime} & y_{3}^{\prime} & y_{4}^{\prime} \\ z_{1}^{\prime} & z_{2}^{\prime} & z_{3}^{\prime} & z_{4}^{\prime} \\ 1 & 1 & 1 & 1 \end{array} \right]^{-1} $$
(4)

Remark 1

The coordinates of the four points in the robot coordinate system can be measured by moving the end-effector of the robot to the specified point and then using robot forward kinematics to calculate its position with respect to robot coordinate frame. The coordinates in the Kinect coordinate system can be measured by manually selecting the points on the end-effector of the robot in the 3D point cloud.

Remark 2

Each pair of the coordinates satisfies (3). By measuring the coordinates of the four points, we can get four equations about T. By solving these four equations together, the transformation matrix T can be obtained, as shown in (4).

3 Collision prediction

3.1 Region of interest

As the Kinect sensor provides up to 300,000 points per frame, the point cloud processing algorithms can be highly time consuming. To improve the processing speed, a region of interest (ROI) is determined based on the skeleton of the robot. Only the points with distances to the skeleton smaller than a pre-defined threshold r R O I are considered in the following collision prediction algorithms. To obtain the minimum distance between a point and the robot skeleton, several interpolation points on each link of the skeleton are calculated as equal division points between two adjacent joints X i and X i+1 , as shown in (5).

$$ \boldsymbol{x_{k}} = \boldsymbol{X_{i}} + \frac{j}{m}\left( \boldsymbol{X_{i+1}} - \boldsymbol{X_{i}}\right),~i = 0,1,\ldots,n ,~j = 0,1,\ldots,m, $$
(5)

where k = i m + j, x k is the j th interpolation points on the i th link, m is the amount of the points that is interpolated on each link, and n is the total number of the links on the robot. Then, the distance between the point p i and the robot skeleton is defined as the minimum distance between p i and all the interpolated points x k on the robot skeleton, and can be calculated by:

$$ d\left( \boldsymbol{p_{i}}\right) = \min_{k = 0,1,\ldots,m \times n}\left\|\boldsymbol{p_{i}}- \boldsymbol{x_{k}}\right\|. $$
(6)

Denote the set of points inside the ROI as \(\boldsymbol {P_{ROI}} \subseteq \boldsymbol {P}\), then the points inside the ROI can be obtained by calculating the distances \(d\left (\boldsymbol {p_{i}}\right )\) for all the points in the point cloud, and compare them with the predefined threshold r R O I , as (7).

$$ \boldsymbol{P_{ROI}} = \left\{\boldsymbol{p_{i}}\left| d\left( \boldsymbol{p_{i}}\right)< r_{ROI},~\boldsymbol{p_{i}} \in \boldsymbol{P}\right.\right\}. $$
(7)

The ROI is shown in Fig. 4, with r R O I =0.4m, where only 13.7 % points are included in the ROI.

Fig. 4
figure 4

The region of interest. r R O I =0.4m

3.2 Over-segmentation

The set of points P ROI is firstly over-segmented into superpixels based on the k-means clustering method [9], which aims to partition the points in P ROI into k s sets \(\boldsymbol {P_{ROI}} = \left \{\boldsymbol {S_{1}}, \boldsymbol {S_{2}}, \ldots , \boldsymbol {S_{k_{s}}}\right \}\) in order to minimize the within-cluster sum of squares (WCSS), or say

$$ \underset{\boldsymbol{P_{ROI}}} {\operatorname{arg\,min}} \sum\limits_{i=1}^{k_{s}} \sum\limits_{\boldsymbol{p} \in S_{i}} \left\| \boldsymbol{p} - \boldsymbol{\mu_{i}} \right\|^{2}, $$
(8)

where μ i is the mean of the points in S i .

Definition 1

[17] The superpixels are sets of pixels that are roughly homogeneous in size and shape, and are local, coherent, and which preserve most of the structure necessary for segmentation at the scale of interest.

The amount of superpixels needs to be large enough to ensure that none of the superpixels contains more than one kinds of the following points, the robot, the obstacle and the object to be operated. The over-segmentation algorithm is shown in Algorithm 1 and the results are shown in Fig. 5 with different number of superpixels, and Fig. 6 with the amount of superpixels k s =30.

Fig. 5
figure 5

Superpixels with different k s

Fig. 6
figure 6

Over-segmentation result of the P ROI with k s =30

figure f

Remark 3

The amount of superpixels k s can be determined based on pilot experiments. Extra superpixels will increase the computational load. While small number of superpixels may lead to the problem that some superpixel may contain the robot and the obstacle at the same time. In this work, different k s are tried to find the minimum value of k s , which can split the obstacle and the robot into different superpixels, as shown in Fig. 5. Based on these experiments, k s is eventually set as k s =30.

Remark 4

In classic k-means clustering algorithms, the initial means μ i are selected randomly for each set of the observations, which will cause unstable clustering results between different frames of point cloud. As the over-segmentation needs to be implemented on each frame, and a smooth clustering result is important for the control strategy, a continuous k-means clustering method is proposed. In this clustering method, the initial means μ i are selected randomly only on the first frame, and for the following frames the means are inherited from the last clustering result,

3.3 Self-identification

3.3.1 Simplified model

Unlike the previous methods [15, 20], our proposed method does not depend on the accurate 3D model of the robot. Instead of using the priori 3D model, we have designed a self-identification method based on the 3D point cloud to automatically generate a simplified 3D model of the robot, which is suitable for the obstacle detection. In this model, the robot is presented by several spheres along the skeleton of the robot. The centres of the spheres are the interpolation points x k , which is obtained from (5). The initial radiuses of the spheres are set to 0, as shown in Fig. 7, where the red thick lines are the skeleton of the robot, and the green spheres are the initial simplified model of the robot. The self-identification process is to estimate the radius of each sphere to approximate the real situation based on the segmentation of the point cloud.

Fig. 7
figure 7

The simplified model

3.3.2 Segmentation of the robot

The set of points on the robot \(\boldsymbol {P_{r}} \subseteq \boldsymbol {P_{ROI}}\) is segmented from P ROI to estimate the simplified robot model. As P ROI is already over-segmented, the segmentation is based on the superpixels \(\left \{\boldsymbol {S_{1}}, \boldsymbol {S_{2}}, \ldots , \boldsymbol {S_{k}}\right \}\). The superpixels on the robot are the superpixels that have at least one point near the robot skeleton. In other words,

$$ \boldsymbol{P_{r}} = \left\{\boldsymbol{S_{i}}\left|\exists~ \boldsymbol{p} \in \boldsymbol{S_{i}}: d(\boldsymbol{p})<d_{TH}, \boldsymbol{S_{i}}\in \boldsymbol{P_{ROI}}\right.\right\} $$
(9)

where d T H is a predefined distance threshold. Thus, the robot segmentation algorithm is designed as shown in Algorithm 2. The segmented points on the robot P r are shown in Fig. 8

Fig. 8
figure 8

Segmentation result with d T H =50m m

figure g

Remark 5

The threshold d T H is determined based on the ground truth of the segmentation, by searching the minimum distances d(p) of the points in each super pixel on the robot. The threshold should be just larger than all the minimum distances. The segmentation results with different d T H are shown in Fig. 9. As can be seen, there is a large acceptable range for d T H , from 50m m to 100m m. A too small d T H will result in an incomplete segmentation of the robot, as shown in Figs. 9a–e. And a too large d T H may make the obstacle been identified as a part of the robot body. Thus, d T H =50m m is chosen in this work particularly for the Baxter robot.

Fig. 9
figure 9

Segmentation results with different d T H

3.3.3 Model update law

The model update law is used to update the radius of each sphere on the simplified robot model, so as to let the robot model exactly contain all the points in P r . To achieve this, the set of points P r is re-segmented for the spheres based on distances, as \(\boldsymbol {P_{r}} = \left \{\boldsymbol {X_{1}}, \boldsymbol {X_{2}}, \ldots , \boldsymbol {X_{m \times n}} \right \}\). The centres of the spheres x k can be seen as the center of the corresponding sub-set X k and the points with minimum distances to x k are added into X k . The containing problem is now reformed as to let each sphere with a center of x k exactly contains the points in the corresponding sub-set X k , which can be easily achieved by set the radius r k as the maximum distance between the points in X k and the sphere center x k .

In practice, the update law may also encounter two problems, i.e., the noise in the point cloud and the missing points caused by occlusion. For the first problem, as the noise in the point cloud will affect the stability of the model, a filter is designed for each of the sphere as (10).

$$ r_{k}^{*}(t) = k_{fr} r_{k}^{*}(t-1) + (1- k_{fr})r_{k}(t),~ k = 1, 2, \ldots, m \times n, $$
(10)

where r k (t) is the raw radius at time t, \(r_{k}^{*}(t)\) is the filtered radius at time t, and k f r ∈(0,1) is the factor of the filter. For the second problem, if most of the points in one sub-set X k is not visible, the corresponding radius r k will shrink rapidly, which is incorrect. To solve this problem, a threshold of the amount of the points in X k , T k , is set as a prerequisite of the update law. If X k contains less than T k points, the update of r k will be cancelled for this frame of data. The model update law is designed as shown in Algorithm 2. The identified robot model is shown in Fig. 10.

Fig. 10
figure 10

Self-identification

figure h

Remark 6

The threshold T k is used to guarantee that the radius r k will be updated only if X k contains enough points to indicate the general size of the robot. It also relates to the density of the point cloud. The higher the density is, the larger T k should be. In this work, the threshold is set as T k =20.

3.4 Obstacle detection

The surrounding points P SUR = P ROI P r can be further divided into four categories based on the superpixels, namely, the points on the obstacle for the left arm and the right arm P ol and P or , the points on the object to be operated P d , and the other points. The operating object set of points \(\boldsymbol {P_{d}} \subseteq \boldsymbol {P_{SUR}}\) is defined as the points in the superpixels with means μ near the end-effector of the robot, which can be obtained by

$$ \boldsymbol{P_{d}} = \left\{\boldsymbol{S_{i}}\left| \left\|\boldsymbol{\mu_{i}} -\boldsymbol{X_{n}} \right\|<d_{pd}, \boldsymbol{S_{i}}\in \boldsymbol{P_{SUR}}\right.\right\}, $$
(11)

where X n is the end-effector of the robot, μ i is the mean of the superpixel S i , d p d is the distance threshold for calculating P d and is set to d p d =300m m in this work.

The obstacle sets \(\boldsymbol {P_{ol}} \subseteq \boldsymbol {P_{SUR}} - \boldsymbol {P_{d}}\) and \(\boldsymbol {P_{or}} \subseteq \boldsymbol {P_{SUR}} - \boldsymbol {P_{d}}\) are defined as the points in the superpixels with means μ near the movable links of the manipulators. Take P ol as an example, it can be obtained by

$$ \boldsymbol{P_{ol}} = \left\{\boldsymbol{S_{i}}\left| d_{l}(\boldsymbol{\mu_{i}}) <d_{obs}, \boldsymbol{S_{i}}\in \boldsymbol{P_{SUR}} - \boldsymbol{P_{d}}\right.\right\}, $$
(12)

where d l (⋅) is the distance to the movable links of the left arm, which is defined similar as (6) using the interpolated points on the movable links of the left arm, d o b s is the predefined detection range of the obstacles, and is set to d o b s =300m m in this work.

The four categories are shown in Fig. 11, where the green points indicates the obstacle points P ol and P or , and the blue points indicates the points P d on the object to be operated.

Fig. 11
figure 11

Obstacle detection

Remark 7

The points P d on the object to be operated are found near the end-effector positions of the manipulators and is excluded from the obstacle points P ol and P or . This is because that there will definitely be a collision between the operating object and the end-effector when the robot is manipulating the object. For instance, when the manipulator is grabbing the object with a gripper, the object will intrude into the gripper. However, these kinds of “collision” are intentional and should not be avoided.

3.5 Collision points estimation

The collision points, p cr ∈{x k |k=1,2,…,n m} and p co P ol o r P or , are the two points either on the robot or on the obstacle, which covers the nearest distance between the robot and the obstacle, as shown in Fig. 11. The noise on the depth image from the Kinect sensor will transfer into the point cloud, and will lead to frequent changes in p co . A first-order filter is designed to obtain a stable and smooth changing collision point \(\boldsymbol {p_{co}^{*}}\), which is given by (13).

$$ \boldsymbol{p_{co}^{*}}(t) = k_{fo}\boldsymbol{p_{co}^{*}}(t-1)+(1-k_{fo})\boldsymbol{p_{co}}(t), $$
(13)

where p co (t) is the raw collision point on the obstacle at time t; \(\boldsymbol {p_{co}^{*}}(t)\) is the filtered collision point; and k f o ∈(0,1) is the filtering strength parameter. With a larger k f , the filtered collision point \(\boldsymbol {p_{co}^{*}}(t)\) will change more smoothly yet with larger time delay. Thus, k f is selected to guarantee both the control smoothness of the robot and a satisfactory reaction rate.

4 Robot control strategy

4.1 Inverse kinematics

In order to control the end-effector of the manipulator following the reference trajectory in Cartesian space, the numerical inverse kinematic method is used to calculate the reference joint velocities. Consider one manipulator arm of the robot. Denote the joint velocities as \(\dot {\boldsymbol {\theta }}\), then the end-effector velocity can be given by (14).

$$ \dot{\boldsymbol{x}} = \boldsymbol{J}\dot{\boldsymbol{\theta}}, $$
(14)

where \(\dot {\boldsymbol {x}}\) is the end-effector velocity and J is the Jacobian matrix of the manipulator.

If the dimension of \(\dot {\boldsymbol {\theta }}\) is larger than the dimension of \(\dot {\boldsymbol {x}}\), i.e., the degree of freedom (DOF) of the manipulator arm is more than the number of the desired end-effector velocity components, the manipulator will become kinematically redundant, which can be used to achieve some secondary goals, e.g., obstacle avoidance, in addition to the end-effector trajectory following task.

For the inverse kinematic problem of the kinematically redundant manipulator, infinite number of solutions can be find. The general solution is given by (15).

$$ \dot{\boldsymbol{\theta}} = \boldsymbol{J}^{\dag}\dot{\boldsymbol{x}}+(I-\boldsymbol{J}^{\dag} \boldsymbol{J})\boldsymbol{z}, $$
(15)

where J = J T(J J T)−1 is the pseudo-inverse of J and z is an arbitrary vector, which can be used to achieve the obstacle avoidance [8].

4.2 Collision avoidance strategy

When the collision points p cr and p co are found, the manipulator needs to move away from the obstacle. The desired velocity \(\dot {\boldsymbol {x}}_{o}\) moving away from the obstacle is chosen as below in (16) according to our previous work [26].

$$ \dot{\boldsymbol{x}}_{o} = \left\{ \begin{aligned} &\boldsymbol{0} ~ , d \geq d_{o}\\ &\frac{d_{o} - d}{d_{o} - d_{c}}v_{max}\frac{\boldsymbol{p_{cr} - p_{co}}}{d} ~ , d_{c}<d<d_{o}\\ &v_{max}\frac{\boldsymbol{p_{cr} - p_{co}}}{d} ~ , d \leq d_{c}\\ \end{aligned} \right., $$
(16)

where \(d = \left \|\boldsymbol {p_{cr}} - \boldsymbol {p_{co}}\right \|\) is the distance between the obstacle and the manipulator; v m a x is the maximum obstacle avoidance velocity; d o is the distance threshold that the manipulator starts to avoid the obstacle; d c is the minimum acceptable distance and the manipulator will avoid at the maximum speed.

On the other hand, in order to eliminate the influence of the obstructing when the obstacle has been removed, the manipulator is expected to restore its original state. To achieve that, an artificial parallel system of the manipulator is designed in the controller in real time to simulate its pose without the influence of the obstacle, as shown in Fig. 12, where the dashed black line indicates the parallel system. The restoring velocity \(\dot {\boldsymbol {x}}_{r}\) is then designed as below [26]

$$ \boldsymbol{\dot{x}_{r}} = \boldsymbol{K_{r}}\boldsymbol{e_{r}}, $$
(17)

where K r is a symmetric positive definite matrix and \(\boldsymbol {e_{r}} = \left [\boldsymbol {e_{r1}} ~ \boldsymbol {e_{r2}} ~ \boldsymbol {e_{r3}}\right ]^{T}\) is the position errors of the joints between the parallel system and the real system.

Fig. 12
figure 12

The artificial parallel system built using Baxter kinematic model [4]. The solid black line indicates the real manipulator. The dashed black line indicates the manipulator in the artificial parallel system

The control strategy can be obtained by (18), based on our previous work [26].

$$ \boldsymbol{\dot{\theta}_{d}} = \boldsymbol{J_{e}}^{\dag}\left( \dot{\boldsymbol{x}}_{d}+\boldsymbol{K_{e}}\boldsymbol{e_{x}}\right)+\left( \boldsymbol{I}-\boldsymbol{J_{e}^{\dag} J_{e}}\right)\left[\alpha\boldsymbol{z_{o}} + (1-\alpha)\boldsymbol{z_{r}}\right] $$
(18)

where

$$ \boldsymbol{z_{o}} = \left[\dot{\boldsymbol{x}}_{o}^{T} \boldsymbol{J_{o}}\left( \boldsymbol{I}-\boldsymbol{J_{e}^{\dag} J_{e}}\right)\right]^{\dag}\left( \dot{\boldsymbol{x}}_{o}^{T} \dot{\boldsymbol{x}}_{o}-\dot{\boldsymbol{x}}_{o}^{T} \boldsymbol{J_{o}} \boldsymbol{J_{e}^{\dag} \dot{x}_{e}}\right) $$
(19)
$$ \boldsymbol{z_{r}} = \left[\dot{\boldsymbol{x}}_{r}^{T} \boldsymbol{J_{r}}\left( \boldsymbol{I}-\boldsymbol{J_{e}^{\dag} J_{e}}\right)\right]^{\dag}\left( \dot{\boldsymbol{x}}_{r}^{T} \dot{\boldsymbol{x}}_{r}- \dot{\boldsymbol{x}}_{r}^{T} \boldsymbol{J_{r} J_{e}^{\dag} \dot{x}_{e}}\right) $$
(20)

5 Experiments

In order to verify the performance of the proposed collision prediction method, three groups of experiments are designed, i.e., self-identification, collision prediction and collision avoidance. The setup of the system is shown in Fig. 1c. The obstacle detection program is designed based on Java and run under Microsoft Windows 8.1 and Processing 2. The computer used for the obstacle detection program is equipped with an Intel Core i5-3470 CPU at the clock speed of 3.2 GHz.

A typical previous obstacle detection method proposed in [15] is implemented as the contrast of the proposed method, which is described as follow. The previous method uses the neighbourhood of the robot skeleton with a fixed predefined radius to detect the obstacle. Specifically, all the points inside the neighbourhood is considered as the points on the robot; the points outside the neighbourhood and inside the ROI is considered as the obstacle. The radius of the neighbourhood is selected as small as possible under the premise of that no points on the robot will be considered as the obstacle with such a radius. In practice, several convex portions on the robot, e.g., the elbow, makes it impossible to set the radius small enough to fit most part of the robot. Otherwise, the convex portions will be considered as obstacles close to the robot and influence the obstacle avoidance control.

5.1 Self-identification

In the first group of experiments, the proposed self-identification method is tested. In order to test the 3D model update law, the left arm of the Baxter robot is wrapped with a piece of clothing to change the shape of the manipulator arm. The identification result with and without the wrapped clothes is shown in Fig. 13. As can be seen, when part of the arm has become thicker, the model changes automatically to fit the actual situation.

Fig. 13
figure 13

Self-Identification results with and without clothes on the left arm

5.2 Collision prediction

In the second group of comparative experiments, the robot is set to a stationary pose to test the collision prediction quality of the two methods. The results are shown in Figs. 14 and 15 for two different scenes. Each of the figures contains the RGB image from the Kinect, the result of the previous method and the proposed method.

Fig. 14
figure 14

Collision prediction results. Scene 1

Fig. 15
figure 15

Collision prediction results. Scene 2

As can be seen in Fig. 14b and e, based on the previous method, part of the human hand is identified as the robot, and the collision points is not quite accurate. When the obstacle is even closer to the robot, as in Fig. 15b and e, the whole hand is identified as the robot based on the previous method, and the collision points have a large deviation compared to the real situation. In contrast, based on the proposed method, the human hand is completely segmented from the robot, and the collision points are accurate enough for the collision avoidance task.

5.3 Collision avoidance

In the third group of comparative experiments, the collision prediction results are sent to the collision avoidance controller proposed in Section 4.

The ROI of the previous method is firstly set the same as the proposed method, the collision avoidance result is shown in Figs. 16 and 17. As can be seen in Fig. 16, based on the previous method, when the obstacle is moving close to the robot quickly, part of the obstacle is identified as the robot and the collision point is incorrectly far from the robot. Thus, the robot cannot move away from the obstacle. In contrast, based on the proposed method, the obstacle is always segmented from the robot, and the collision point is accurate. So the robot is able to avoid the collision.

Fig. 16
figure 16

Collision avoidance results and video frames based on the previous method with the ROI same as the proposed method

Fig. 17
figure 17

Collision avoidance results and video frames based on the proposed method

Then, the ROI of the previous method is set twice larger than the proposed method to enable the robot avoid the obstacle in advance so that the obstacle is less likely to run into the neighbourhood of the skeleton and to be identified as the robot. However, the larger ROI brings a lot more unnecessary avoidance action. As can be seen in Fig. 18, the obstacle is passing by the robot with a quite large distance. While the robot changes its pose to avoid the obstacle. In contrast, based on the proposed method, the same passing by obstacle does not bring any unnecessary movements, as shown in Fig. 19.

Fig. 18
figure 18

Collision avoidance results and video frames based on the previous method with large ROI

Fig. 19
figure 19

Collision avoidance results and video frames based on the proposed method

The detection accuracy and the processing frame rate are shown in Table 1. As can be seen, the proposed method has a much higher detection accuracy than the previous method when they use a same ROI. When the previous method is implemented with a large ROI to reach an acceptable detection accuracy, it also brings a lot more unnecessary avoidance action. In addition, the proposed method is able to run in a satisfactory frame rate, which is only slightly slower than the previous method.

Table 1 Evaluation of the proposed and the previous method

Figure 20 shows some examples of the failed detection frames, where blue circles pointed out the faulty parts. In Fig. 20a, the protruding portion on the elbow of the robot is detected as an obstacle. This is because the threshold d T H is too small and the superpixel on the elbow is too far away from the robot skeleton. In Fig. 20b, the simplified 3D model has not been fully updated because the running time of the program is too short. The excessively small 3D model causes the wrong detection. In Fig. 20c, the missing in the original point cloud makes partial of the 3D model unable to be updated correctly.

Fig. 20
figure 20

Example of failed detection frames. The blue circles pointed out the faulty parts

Currently, these failed cases do not affect the obstacle avoidance control because these problems occur only occasionally and the filter specified in (13) is able to deal with them and to obtain a stable and smooth changing collision point.

In the future works, we will try to work out suitable solutions for these problems. A variable threshold d T H and a threshold automatic turning strategy based on the generated 3D model may solve the problem in Fig. 20a. An improved model update law with faster convergence can deal with the problem in Fig. 20b. A global optimization method taking the radius of the neighbouring spheres into consideration for the 3D model update law may solve the problem in Fig. 20c.

6 Conclusions

In this paper, a self-identification method has been developed based on the 3D point cloud and the robot skeleton. The point cloud is firstly over-segmented into superpixels based on the continuous k-means clustering method. The superpixels on the robot are then identified using the robot skeleton obtained from the forward kinematics. According to the segmented point cloud, a simplified 3D model of the robot is generated automatically in real-time. Based on the real-time 3D model, an accurate collision prediction method is proposed. Comparative experiments demonstrate that the robot system equipped with the proposed self-identification and collision prediction methods can avoid the collision, even if the obstacle is close to the robot, and at the same time, prevent unnecessary robot movements when the obstacle is far enough from the robot.

6.1 Nomenclature

\(\boldsymbol {X_{i}^{\prime }}\) :

An augmented vector of relative Cartesian coordinate of the i th joint in the robot coordinate system

\(\boldsymbol {X_{0}^{\prime }}\) :

The augmented coordinate of the base of the manipulator in the robot coordinate system

i−1 A i :

The link homogeneous transformation matrix

d i :

The depth value of the i th pixel in the depth image

\({x^{c}_{i}},~{y^{c}_{i}}\) :

The coordinate of the i th pixel in the image coordinate system

c x , c y :

The principal point in the intrinsic parameter of the depth camera

f x , f y :

The focal length in terms of pixels in the intrinsic parameter of the depth camera

x s , y s , z s :

The Cartesian coordinates of the end effectors of the manipulator

x m , y m , z m :

The Cartesian coordinates of the end effectors of Omni

X i :

The coordinate of the i th point in the Kinect coordinate system

x k :

The j th interpolation points on the i th link, k = i m + j

P ROI :

The set of points inside the ROI

r R O I :

The radius of the ROI

k s :

The amount of the superpixels

S i :

The set of points in the i th superpixel

μ i :

The mean of points in S i

P r :

The set of points on the robot

r k (t):

The raw radius of the k th sphere on the robot model at time t

k f r :

The factor of the filter in the robot model update law

P SUR :

The surrounding points of the robot

P ol :

The points on the obstacle for the left arm

P or :

The points on the obstacle for the right arm

P d :

The set of points on the operating object

p cr :

The collision point on the robot

p co :

The collision point on the obstacle

\(\dot {\boldsymbol {x}}\) :

The end-effector velocity

J :

The Jacobian matrix of the manipulator

J :

The pseudo-inverse of J

d :

The distance between the obstacle and the manipulator

v m a x :

The maximum obstacle avoidance velocity

d o :

The distance threshold that the manipulator starts to avoid the obstacle

d c :

The minimum acceptable distance and the manipulator will avoid at the maximum speed

e r :

The position errors of the joints between the parallel system and the real system