1 Introduction

Autonomous grasping and manipulation of stationary objects has been studied extensively for several decades. In contrast, far less work has been done on the more challenging problems of grasping moving objects. This paper presents a novel approach to dynamically tracking and grasping moving objects. We further demonstrate how such methods can be used to enable fundamental human–robot collaboration activities, such as a human handing over an object to the robot (Fig. 1). Additional applications might include grasping objects using a robot arm mounted on an erratically moving underwater vehicle, collaborative assembly of automotive parts etc.

The mainstream grasping literature has previously addressed a variety of grasping sub-problems, including: grasp synthesis based on analytical or data-driven approaches (Kopicki et al. 2016; Lenz et al. 2015; Levine et al. 2016; Roa and Suárez 2015); grasp control with tactile sensing (Romano et al. 2011); reactive grasping dealing with uncertainty in object location and shape (Hsiao et al. 2010); integrated grasp and arm motion planning (Fontanals et al. 2014a). Additionally, adaptation strategies have been proposed to increase the robustness of grasps, including: learning to maintain grasp stability based on tactile sensing Li et al. (2014); learning to adapt hand postures through human corrections (Sauser et al. 2012); using probabilistic inference to detect grasps which are likely to be unsuccessful, and then correct them (Bekiroglu et al. 2016); and human guided shared control-based grasping using dynamic virtual-fixtures (Vogel et al. 2016).

In contrast to the above work, grasping of moving objects engenders a variety of difficult new challenges. An important example of this problem is that of human–robot collaboration, where a human hands an object to a robot, and the robot must track the motions of that object and continuously adapt its grasp plan in order to successfully receive the object from the human.

Fig. 1
figure 1

A human operator handing over a test object to the robot. (Inset) feasible grasp found during the online planning task

Note that this problem cannot be solved by simply combining a conventional vision system (for object pose tracking) with a conventional grasp planner (designed for stationary objects). While the object is moving in unpredictable ways, new grasps and new reach-to-grasp trajectories must be continually replanned on-the-fly. For an arbitrarily moving object, there will typically be periods of time during which no feasible grasp is possible at all. During such periods, it is desirable that the robot continues to “track” the object by keeping its hand as close as possible to a feasible pre-grasp location, even while the actual pre-grasp location is unattainable.

Additionally, even if the robot can keep its hand near to a likely grasp location on the moving object (which we refer to as “grasp tracking”), a small rotation of the object can make this grasp location suddenly disappear completely from the robot’s reachable space and/or from the camera’s field of view. This necessitates immediate replanning of a completely new grasp, for a completely new part of the object, leading to a gross discontinuity as the robot’s trajectory is replanned to reposition the hand near a new object part. We address this problem, by showing how the robot can dynamically switch between:

  1. (i)

    a local planner, which causes the hand to smoothly track the object and maintain a steady pre-grasp pose relative to the object;

  2. (ii)

    a global planner, which plans a new grasp, and new reach-to-grasp trajectory, to rapidly move the hand to a different part of the object, when the previous graspable part becomes unreachable.

1.1 Related work

A comparatively small amount of previous literature has explored the more difficult problems of grasping moving objects. Early work by Houshangi (1990) described a system for online visual servoing of a robot manipulator to grasp a moving target. This early work executed a top grasp on a cylinder, constrained to move on a flat surface while viewed from a ceiling-mounted camera, i.e., the problem was reduced to 2D tracking of a circle on a plane, and execution of a pre-programmed grasp on the circle. Other early work, by Allen et al. (1993), used a PUMA robot with a parallel jaw gripper to grasp a moving toy train which was tracked by a pair of cameras. The main contribution of this work was the development of fast algorithms for computing optical flow at real-time frame rates, using the low-power computers of the time, and predictively extrapolating the motion of the toy train to overcome computational time delays. As with Houshangi (1990), the actual grasp planning in this work was relatively simplistic, and worked by simply grasping the toy train from above, while aligning the gripper’s jaws with the direction of the train’s motion. The work of Smith and Papanikolopoulos (1995) was similarly limited, with a rectangular block, moving in a straight line, being grasped from above. Other papers, e.g. Zhang and Buehler (1994), explore control strategies for dynamic grasping via simplified 2D simulation experiments, in which a successful grasp is defined as servoing a gripper until it has the same position, velocity and acceleration as an object to be grasped (represented as a circle).

In contrast, we use an advanced, learning-based grasp planner, which probabilistically generates a large number of likely grasps at many different locations on the surface of arbitrarily shaped objects with complex 3D geometries. In our experiments, the robot continuously replans its grasps and reach-to-grasp trajectories, while the objects are moving with complex, arbitrary 6-degrees of freedom (DoF) trajectories. As the object moves, the robot dynamically switches between several different reach-to-grasp configurations, i.e., it replans trajectories to reach new parts of an object, when old parts become unreachable (due to both kinematic limitations of the robot and also the constraints of avoiding obstacles in the environment).

More recently, Kim et al. (2014) presented an impressive programming-by-demonstration approach to catching thrown objects. The system learns object flight dynamics by observing examples, and learns a distribution of feasible grasps from several human-demonstrated grasps on each object. The system searches the space of reachable grasps to find an appropriate catching motion which intercepts the trajectory of the thrown object. Joint ranges were limited to improve the speed of trajectory planning, and grasp selection was simplified by using a heuristic of always orienting the palm of the robot’s hand to be opposite to the thrown object’s velocity vector. Collisions between the arm trajectory and surrounding objects were not explicitly considered. In contrast, our approach allows the robot to exploit potential grasps on any part of the object, regardless of motion direction, and reach-to-grasp trajectories are continuously re-planned (utilising the entire motion space of the robot) by avoiding self-collisions. We also exploit a learning-based grasp planner, which can generalise from a small number of examples to plan grasps on new objects with previously unencountered shapes.

Menon et al. (2014) presented a search-based kinodynamic motion planning algorithm that generates a time parametrised trajectory for both the arm and end-effector to grasp a moving object at the earliest feasible point based on graph search. The system relied on prior knowledge of the object’s trajectory and pre-programmed grasps for each object. The proposed planner is unable to compute a real-time trajectory when the object is very close to the robot. In contrast, our approach utilizes a task space error to relax the goal constraints on the reach-to-grasp trajectory, which enables our robot to perform continuous “grasp tracking”, always maintaining a pre-grasp pose, or a pose that is very close to the desired pre-grasp pose, throughout the object’s motion. We also do not use pre-programmed grasps, and our system can autonomously generate novel grasps on new objects with previously unencountered shapes.

In most previous literature, grasp planning and arm motion planning have been seen as two separate steps. A smaller body of recent literature has started to address the problem of simultaneous grasp and motion planning. Vahrenkamp et al. (2012) proposed an RRT based planner, that combines building a feasible grasp, solving the inverse kinematics problem and determining a collision-free trajectory that brings the hand to the grasping pose, and presented evaluations in simulation. This work was limited to grasping of stationary objects, in contrast to our focus on moving objects in this paper. Additionally, the grasp planning of Vahrenkamp et al. (2012) is based on a classical force-closure and wrench-space approach, which requires accurate knowledge of physical parameters, e.g. friction coefficients, which are rarely available for real hands and real objects. In contrast, we use a probabilistic grasp planner based on learning, which is capable of generalising to new objects with unknown physical parameters.

Kitaev et al. (2015) presented a physics-based trajectory optimisation method that plans grasp approach trajectories for grasping in cluttered environments. An interesting contribution of this work, which we do not consider, is the idea of planning reach-to-grasp trajectories which push aside clutter objects on the way to grasping a desired object. However, unlike our work, Kitaev et al. (2015) is aimed at grasping stationary objects, and grasping of complex shapes is also not addressed, with experiments limited to simple grasps on primitive blocks in a simulation environment.

Fontanals et al. (2014b) proposed an approach using the initial configuration of the robotic arm and the pose of the target object to simultaneously plan a hand pose and arm trajectory, where grasps are found based on independent contact regions. Again, this work does not address the problem of grasping moving objects, instead using simulation experiments with stationary objects which have simple shapes. Horowitz and Burdick (2012) formulated the combined problem of grasp contact selection, grasp force optimisation, and manipulator arm/hand trajectory planning as a problem in optimal control. This work also did not address the problem of grasping moving objects, although it does consider the finger contact forces that are needed for moving an object after it has been grasped. Simple fingertip point placements are considered, in a simplified 2D simulation environment. In contrast to the above methods, our work explicitly addresses the problem of grasping objects which are moving with arbitrarily complex 6DoF trajectories. Our method is tested in real experiments, using objects which have complex 3D shapes.

1.2 Overview of our approach

Our approach enables a robot to grasp arbitrarily moving objects (using no prior model information). It is based on learned generative grasp model, which generates offline a diverse set of possible grasp trajectoriesFootnote 1 for a query object, i.e., the object, which robot has to follow and grasp. The object pose is estimated and updated by a particle filter-based optimised pose tracker, which is then used by a real-time iterative inverse kinematics solver (IK solver) based on differential evolution optimisation of joint angles to transform grasp trajectories into a new object frame. At any given instant, a single grasp trajectory is selected online that has the smallest task space error due to kinematic constraints and collisions. Finally, a real-time arm trajectory planner selects and executes a collision-free trajectory leading to the selected grasp trajectory.

Our approach is different from the approaches known from the literature in the following ways:

  1. 1.

    It relies on a generative grasp model which generates a diverse set of grasp trajectories (also called as grasp hypotheses in some parts of this paper) for each new object. Typically for a moving object, only a sub-set of these trajectories can successfully be executed at any particular moment. Nevertheless, it is very likely that for most object poses this trajectory sub-set will not be empty.

  2. 2.

    Grasp trajectories are selected using a soft measure of trajectory quality, a task space error, which can be incorporated into the robot’s motion planning by solving inverse kinematics using a non-classical approach based on iterative numerical optimisation of the robot’s joint angles in order to minimise the soft task space error. The robot continuously “tracks” the current grasp trajectory approach waypoint, by executing trajectories which yield the smallest feasible task space error at any instant. This technique enables handling of situations where no error-free trajectories exist, but where feasible grasps are likely to become available again in the near future.

  3. 3.

    Our approach relies on dynamically switching between two types of planners, depending on the distance of the robot from the current destination pre-grasp position. This technique results in smooth, “local” robot movements (in terms of joint space) as long as the current task space error is not larger than a predefined threshold. However, when the current goal pre-grasp location becomes far from feasible (task space error becomes large), the robot switches to a “global” planner which triggers a gross discontinuity as the robot chooses and moves towards a new pre-grasp pose with respect to the moving object (needed because the object’s motion has caused the previous desired pre-grasp pose to become unachievable). It is worth mentioning that for local trajectory planning, we use a fast anytime local planner which plans collision-free trajectories such that waypoints are guaranteed to stay close to each other in joint space. For global re-planning of new reach-to-grasp trajectories, a non real-time global planner is used to plan collision-free trajectories towards a new choice of grasp pose, that cannot be solved using the local planner.

The remainder of the paper is structured as follows. We describe the components of the system and the robotic hardware in Sect. 2 and the grasp planning algorithm in Sect. 3. Experimental results are presented in Sects. 4 and 5 provides concluding remarks.

2 System outline

Online moving object grasping method presented in the next section requires the object pose to be updated continuously throughout the task. To this extent, in this section, we present a particle filter-based method, which is capable of tracking object’s full 3D pose using partial point clouds of the objects acquired using a low-cost sensor. Initial object scanning and model building tasks are the primary initialisation blocks that are executed before initiating the tracking process. We first present our hardware set-up and then proceed with the object shape modeling and 3D pose tracking.

2.1 Robotic platform

The experimental set-up used in this work is shown in the Fig. 2. It consists of a 7DoF collaborative robot, KUKA iiwa 140 whose end-effector is equipped with a two finger parallel-jaw gripper, Schunk PG 70 with a stroke size of \(68~\mathrm {mm}\). The vision system includes three different cameras: 2 PrimeSense Carmine RGB-D narrow angle sensors (task and track cameras) and 1 Point Grey Grasshopper USB-3.0 camera. The task camera mounted on top of the gripper is mainly responsible for acquiring scene point clouds, which are then processed to generate a target object cloud. Consecutively, the obtained cloud will be used to generate a set of possible grasps as well as to trigger visual object tracking. On the other hand, the track camera located in the workspace is responsible for tracking full pose of the moving objects. The Point Grey camera is used merely for recording videos of experiments, which can be augmented to aid visualisation by superimposing virtual views of possible grasps being considered by the planner onto the camera images. All the three cameras are calibrated with respect to the robot base frame and are connected to the work computer (4.00 GHz Intel Core i7 CPU with 32 Gb of RAM). The communication with the robot controller (running Kuka Sunrise.OS) to transfer motion commands has been realised as a cross-platform client (C++) - server (Java) communication over UDP, where the client program runs on the work PC. The objects used for robot-human handover experiments are visually tracked with respect to the robot task space and, as a matter of safety, the robot’s joint torque limits are constrained.

Fig. 2
figure 2

Experimental setup used for this work

2.2 Object shape modeling

The objects used for training and validating our approach are shown in Fig. 3a. In this work, all these objects are represented as (partial) depth point clouds obtained by registering multiple depth clouds. Overall model generation process is comprised of two steps: scanning and stitching. During the scanning phase, robot moves to four different poses around the object. At each individual pose a single depth cloud is acquired (see first two rows of Fig. 3b) using the robot wrist mounted task camera. The robot poses are selected such that all faces of the object are maximally covered. Later during the stitching phase, all these individual depth clouds are first processed to remove the dominant ground plane. The remaining points from the processed clouds are registered together to form a model of the object. Last row of Fig. 3b show multiple views of an obtained final registered point cloud of a test object. In this case, stitching the point clouds is trivial since the exact pose of the camera is known from the robot’s forward kinematics. Alternatively, the stitching accuracy can also be improved using cloud registration algorithms like iterative closest point (ICP) (Rusinkiewicz and Levoy 2001), three dimensional normal distribution (Magnusson 2009) etc. Figure 3c shows registered partial point clouds of various test objects. Since our algorithm, presented in the next section, is capable of working robustly on partial point clouds, no efforts were made in this work to interpolate or smooth these partial clouds.

Fig. 3
figure 3

a Objects used in this work. Red line differentiates train and test objects. The two objects on the left are used for training and on the right are used for validation, as labeled: (1) dishwash, (2) can (3) bar (4) coffee box (5) jug (6) big can (7) lego toy. b Illustration of object modeling. Point clouds clouds obtained from four scanning poses are shown in the first two rows. Multiple views of final segmented and registered partial cloud of a test object are shown in the third row. c Partial point clouds of remaining test objects (Color figure online)

2.3 Object pose tracking

Visual tracking of the object’s 6DoF pose plays an important role in planning and controlling the robot trajectories. In this work, we use a particle filter-based 3D tracker to track full 6D0F pose of a moving object in order to generate both grasp and motion plans (Aldoma et al. 2012). Even though this type of visual object tracking is not new, many researchers in the fields of computer vision and robotics have previously proved their robustness and adaptivity in case of solving challenging problems (Fukui et al. 2016; Marturi et al. 2015; Wu et al. 2013). Nevertheless, it is worth mentioning that due to the highly modular nature of our implementation, the current pose tracker can be easily replaced with another one e.g. a CAD model-based tracker (Ortenzi et al. 2016; Tamadazte et al. 2010) or a coupled layer pose tracker (Ma et al. 2016).

In this work, the tracker is initialised using the registered object’s point cloud. Hereafter, for each new cloud, the tracker uses the tracked pose obtained from the previous frame to initialise, and propagates a uniform sample of N particles, where each particle represents a candidate 6DoF pose of the object. Likelihoods for each particle are computed according to the degree of fit between a reference point cloud (for first cloud in process, this will be the registered cloud from object modeling step) and the input cloud obtained from the track camera, using the distance likelihood function \(\mathcal {L}_{dist}(r_j, c_j)\) given by (1):

$$\begin{aligned} \mathcal {L}_{dist}(r_j, c_j) = \frac{1}{1+ |r_j - c_j|^2} \end{aligned}$$
(1)

where, \(r_j\) and \(c_j\) are the jth points in the reference and current (input) point clouds, respectively. Weights with each particle \(\omega _{i|i=1\ldots N}\) are thus computed using (2)

$$\begin{aligned} \omega _i = \sum _j \mathcal {L}_{dist}(r_j, c_j) \end{aligned}$$
(2)

The particle with the maximum weight will be retained as the reference cloud for the next iteration and its pose will be used as an input for the robot’s motion planning module.

In order to speed-up the overall process, we parallelized the likelihood computation over multiple threads on a CPU. In other cases, such as high volume tracking, the process speed can be further improved by implementing these computation routines over a GPU as shown by Choi and Christensen (2013). Besides, the speed of the tracking also depends on the number of particles to track and the size of the target object, i.e., the density of the points in the clouds. For high volume clouds, the tracking speed can be increased by downsampling the points and by reducing the number of particles. However, this can affect the overall visual tracking accuracy (in turn the grasp selection accuracy). Hence a good trade-off needs to be achieved beforehand. In this work, by trial and error based on failure rate, we selected a downsampling size of \(0.5~\mathrm {cm}\) and the number of particles to track as \(\left( \frac{S}{75} \right) \), where S is the total number of points in the cloud. Alternatively, this selection process can be automated using adaptive version of the particle filter proposed by Soto (2015).

3 Online grasping algorithm

The proposed online grasping algorithm is summarized in Algorithm 1. Here we consider tracking (arm following the object) and grasping of each object as a trial. Each trial consists of five different phases: (i) grasp generation, (ii) tracking, (iii) selection, (iv) planning and (v) grasp execution. Phases (ii)–(iv) are repeated until a human co-worker gives the robot permission to enter the grasp execution phase (v). Each of these phases is detailed below.

figure i

3.1 Grasp generation

At the beginning of each experimental trial, for a stationary object (registered depth point cloud), a number of grasp trajectories \(\mathcal {T}^{v^w}_j\) are generated using generate() function. These grasp trajectories are sequences of waypoints \((v^w, c^h)\) given by (3), each consisting of hand-wrist pose \(v^w \in SE(3)\) and hand-joint configuration \(c^h \in \mathbb R^{N^h}\), with \(N^h\) being the number of hand joints,

$$\begin{aligned} \mathcal {T}^{v^w}_j = \{(v^w, c^h)_{i}\} \text{ with } i=1\ldots m_j, \end{aligned}$$
(3)

where, \(m_j\) represent number of waypoints of jth trajectory.

The grasp trajectory beginning, called approach waypoint, has index-1 (pre-grasp). The last grip waypoint specifies the hand-object contact and has index \(m_j\), which may be different for different grasp types (Kopicki et al. 2016). The grasp trajectory generator, represented by generate(), relies on a generative grasp learning approach introduced in our previous work, (Kopicki et al. 2014, 2016). It generates from a single or multiple-view point cloud, a number of grasp hypotheses, i.e., grasp trajectories \(\mathcal {T}^{v^w}_j\), where only the first \(\mathcal {N}\) most likely ones are selected. In Sect. 4, we analyse the affect of \(\mathcal {N}\) selected hypotheses on the performance of our algorithm. The grasp hypotheses are generated by considering previously known (from training) hand configuration model and object contact model, which is the relation between a rigid link of the hand and the local object shape near its point of contact. Combinely, these two models are represented as probability distributions (kernel density estimation models). For a query object, candidate grasps are explored to find a candidate grasp that maximises the product of these distributions.

This approach does not assume any models of objects, and is robust enough even if the training and validation objects are substantially different (see Fig. 3). The grasp planning algorithm (Kopicki et al. 2016) provides a diverse set of grasp trajectories of which, only a subset is expected to be feasible at any particular moment, given moving object and static obstacles.

3.2 Tracking

The tracking phase starts after grasp generation. During this phase, a human operator starts moving the object (arbitrarily) for which a set of feasible grasps were generated during the grasp generation phase. The robotic arm “tracks” the grasp approach waypoint of a current grasp trajectory\(\mathcal {T}^{v^w}_r\), however without executing the entire grasp trajectory, i.e., the final grasp waypoint will not be reached during this phase.

Tracking, track(), relies on the previously described visual object pose tracker, which continuously estimates pose changes \(s\in SE(3)\) of the moving object’s point cloud \(\mathcal {C}\). These tracked or updated poses are then used to transform wrist poses of all generated grasp trajectory waypoints back to the object frame, as shown in (4).

$$\begin{aligned} s \otimes \mathcal {T}^{v^w}_j = \mathcal {T}^{s v^w}_j \end{aligned}$$
(4)

where, \(\otimes \) denotes rigid body transformation applied to the entire set of waypoints of the grasp trajectories (see (3)), and s is the object pose update (given by (5)) that is computed from the current point cloud pose \(v^\mathcal {C}\) with respect to the initial or reference pose \(v^\mathcal {C^I}\),

$$\begin{aligned} s = v^\mathcal {C} (v^\mathcal {C^I})^{-1}. \end{aligned}$$
(5)

In (4), \(\mathcal {T}^{s v^w}_j\) is a set of grasp trajectories in the updated object frame and is represented as

$$\begin{aligned} \mathcal {T}^{s v^w}_j \equiv \{(s v^w, c^h)_{ij}\} \text{ with } i=1\ldots m_j \end{aligned}$$
(6)

where, \(s v^w\) are the transformed wrist poses.

3.3 Selection

In this phase, the grasp trajectory to be executed with the current object’s pose will be selected. The current grasp trajectory \(\mathcal {T}^{c^a}_r\) is the rth trajectory computed and selected from the previously obtained set of grasp trajectories \(\{\mathcal {T}^{s v^w}_j\}_{j=1{\ldots }N}\) (with N being number of hypotheses) in a two stage process as explained below.

3.3.1 Anytime iterative inverse kinematic solver

In the first stage, grasp trajectory waypoints \((s v^w, c^h)_{ij}\) with wrist poses \(s v^w \in SE(3)\) are transformed by inverse() into trajectories in a complete joint space of the robot,

$$\begin{aligned} \mathcal {T}^{c^a}_j = \{(c^a, c^h)_{ij}\} \text{ with } i=1\ldots m_j \end{aligned}$$
(7)

with hand-joint configuration \(c^h \in \mathbb R^{N^h}\) and arm-joint configuration \(c^a \in \mathbb R^{N^a}\), where \(N^a\) is the number of arm joints. Trajectories given by (7) are computed using our anytime, iterative, inverse kinematic solver (IK solver), which is based on differential evolution optimisation of joint angles, but not on, e.g. inverting or pseudo-inverting the Jacobian as in most classical methods. The complete theoretical description of the used inverse kinematics algorithm can be found in Sect. 4.2.3.4 of (Kopicki 2010).

The IK solver provides a collision-free solution \(\mathcal {T}^{c^a}_j\) for any input target \(\mathcal {T}^{s v^w}_j\) given various task-space constraints such as collision bounds or joint limits. This is possible because solutions are allowed to be inaccurate with accuracy measured by task space errors (see Eq. (8)). The algorithm favours solutions which waypoints \(c_{ij}\) are close to the current robot configuration \(c^c\). Furthermore, the algorithm can be interrupted after a specific amount of time to preserve fixed duration of the robot control cycle in phases (ii) to (iv).

3.3.2 Error-based selection

The quality of inverse kinematic solutions are measured by a task space error, which compares the desired wrist pose \(s v^w\) to the pose computed by forward kinematics map \(\mathcal {K}^{for}: \mathbb R^{N^a} \longrightarrow SE(3)\) from the approximate solution \(\widetilde{c}^a\),

$$\begin{aligned} \epsilon _{ij}&= (1 - a)~\Vert \texttt {lin}(s v^w_{ij}) - \texttt {lin}(\mathcal {K}^{for}(\widetilde{c}^a_{ij}))\Vert ^2 \nonumber \\&\qquad \,\, +a~(1 - |\texttt {ang}(s v^w_{ij}) \cdot (\texttt {ang}(\mathcal {K}^{for}(\widetilde{c}^a_{ij})))^{-1}|) \end{aligned}$$
(8)

where \(\texttt {lin}(.)\) extracts the argument \(\mathbb R^3\) translation vector, \(\texttt {ang}(.)\) extracts the argument SO(3) quaternion, \(\Vert .\Vert \) is the Euclidean norm, \(\cdot \) is the quaternion dot product, and \(a \in \langle 0{\ldots }1 \rangle \) is a coefficient weighting linear and angular distances.

The overall j-th trajectory error \(\epsilon _j\) is a sum of task space errors for of all waypoints,

$$\begin{aligned} \epsilon _j = \frac{1}{m}_j \sum _{i=1}^{m_j} \epsilon _{ij} \end{aligned}$$
(9)

After computing all grasp trajectories together with corresponding errors \(\{\mathcal {T}^{c^a}_j, \epsilon _j\}_{j=1{\ldots }\mathcal {N}}\), the k-th trajectory with the least trajectory error \(\epsilon _k\) is found,

$$\begin{aligned} \{r, \epsilon ^*_r\} = {\left\{ \begin{array}{ll} k, \epsilon _k &{} \text {if }\epsilon _k+\delta _w<\epsilon ^*_r\text {, where } k= \underset{j}{\arg \min }\, \epsilon _j\\ r, \epsilon ^*_r &{} \text {otherwise} \end{array}\right. } \end{aligned}$$
(10)

The k-th trajectory becomes current grasp trajectory r, if its error \(\epsilon _k\) is smaller than previous least error \(\epsilon ^*_r\) with additional offset \(\delta _w \in \mathbb R^{+}\). The offset \(\delta _w\) prevents too frequent trajectory selection and is selected by trial and error. For the experiments reported in this paper, we used a value of \(\delta _w = 2.0\).

3.4 Planning

This fourth phase of our algorithm involves finding a tracking trajectory\(\mathcal {T}^{c^c}_r\) from the current robot configuration \(c^c\) to the approach waypoint \(c_{1r}\) of the current grasp trajectory \(\mathcal {T}_r\), selected from the previous phase. This is then executed immediately.

3.4.1 Tracking trajectory planning

Tracking trajectory \(\mathcal {T}^{c^c}_r\) is found using one of the two trajectory planners (see details in Sect. 4.4.4 of Kopicki (2010)):

  • local iterative optimisation-based local planner, represented as local(); (or)

  • probabilistic roadmap (PRM) based global planner, represented as global().

Both the planners find a collision-free trajectory \(\mathcal {T}^{c^c}_r\) from the current robot configuration \(c^c\) to the target approach waypoint \(c_{1r}\). Local planner guarantees that all trajectory waypoints \(\{c_i\}\) lie in some neighbourhood of \(c^c\) and \(c_{1r}\). Global planner does not guarantee this property, albeit it tries to find the shortest path from \(c^c\) to \(c_{1r}\). The decision whether to use global or local planner is made based on the Euclidean distance between \(c^c\) and \(c_{1r}\),

$$\begin{aligned} \mathcal {T}^{c^c}_r = {\left\{ \begin{array}{ll} \texttt {global}(c^c, c_{1r}) &{}\text {if }\Vert c^c - c_{1r}\Vert ^2>\delta _c\\ \texttt {local}(c^c, c_{1r}) &{}\text {otherwise}\end{array}\right. } \end{aligned}$$
(11)

where \(\delta _c \in \mathbb R^{+}\) is a threshold which controls switching of the global and local planning. Typical values of \(\delta _c\) can be in between 0 and 1, where smaller values allow faster switching.

3.4.2 Tracking trajectory execution

Robot performs tracking trajectory \(\mathcal {T}^{c^c}_r\) at the end of each control cycle. In this way robot tries to stay at the approach waypoint of the current grasp trajectory until tracking is interrupted.

3.5 Grasp execution

In the current implementation, also for demonstration purposes, a human operator decides when to interrupt tracking, i.e., to stop arm following the object and execute the final stage of the current grasp trajectory \(\mathcal {T}_r\), which means to reach the final grasp waypoint. The online grasping algorithm guarantees that \(\mathcal {T}_r\) is a trajectory with the task space error \(\epsilon ^*_r\) no larger than \(\epsilon _k+\delta _w\), where \(\epsilon _k\) is the smallest trajectory error among all grasp trajectories (see Eq. (9)).

4 Experimental evaluation

Various experiments are conducted using the system presented in Sect. 2 to evaluate the performance of our proposed moving objects adaptive grasping approach. Validation has been performed considering different aspects of the algorithm. Firstly, we evaluate the performance of our approach qualitatively in the scenario where a human handovers different test objects to the robot, which is also the central objective of this work. Main motivation behind these tests is two-fold; to find out the suitability of objects that can be used with our current implementation (especially visual tracking) and to analyse the system performance in terms of repeatability. Next, we evaluate the effect of various parameters in a semi-controlled manner, where again a human demonstrator moves the objects, but in a pre-defined systematic way. It is worth noting that the human operator is well-trained in using the industrial robots and possess extensive knowledge of the developed system. Before proceeding with the experimental analysis, we begin this section detailing the training process as well as with an analysis of the used visual tracker.

4.1 Grasp training

The grasp training has been performed as follows. Initially we obtain the stitched point clouds of the two training objects (wooden block and a can) depicted in Fig. 3 by scanning them from four different views. Using our implemented interface, a human operator then demonstrates each possible grasp kinesthetically on the obtained point cloud. During this process, different hand poses are recorded along the trajectory from approach configuration to the final grip (contact) as shown in Fig. 4. Due to simple two finger (parallel jaw) configuration of the used gripper, only pinch and power grasp configurations are used in this work. The hand-joint configuration and contact models are learned from a final configuration of this trajectory, which are used for generating grasp hypotheses as explained in Sect. 3.1. An important point to note is that the training is solely performed using the kinematic information and no tactile information is used.

Fig. 4
figure 4

Sample screenshots obtained during the training process, where an operator demonstrates the possible grasps kinesthetically using our software interface. Final grip configuration of the robotic parallel gripper with contact regions (in cyan) can be seen in the images (Color figure online)

4.2 Analysing visual object pose tracker

From the description of our method, it is obvious that visual object pose tracking plays a vital role in the success of our approach, i.e., to dynamically adapt grasp trajectories for the moving objects. In this context, we assess the performance of the implemented particle filter-based tracker irrespective of object’s size, shape and colour. As mentioned before, the tracker is initialised using the supplied depth point cloud (no rgb information is used, except for visualisation) of the target object and its various parameters are set as explained in Sect. 2.3. Once initialised, the object has been moved arbitrarily by a human, where its poses are tracked and updated simultaneously. Top two rows of Fig. 5 show the results of tracking two test objects at different instances of time and the bottom row of Fig. 5 shows the trajectories followed by the objects in task space. After evaluating all the test objects over ten different tracking trials spanning 2 min, it has been determined that by using our choice of parameters, on an average we are able to process 7.62 clouds per second. Since the tracker performed reliably well in all cases, we believe its applicability for the online task planning. Nevertheless, an important point to be highlighted is target object occlusion. Even though particle filter-based trackers have been previously proved to be robust enough handling large occlusions, in this work we avoid such cases; the human is constrained to hold the objects only at bottom while working with real robot. Moreover, since no object detection has been used (e.g. model matching), the tracker re-initialisation is not considered in this work. Therefore any trial matching this case is treated as a failure.

Fig. 5
figure 5

The first two rows show some of the screen-shots obtained during pose tracking of (top) coffee filter box and (middle) dishwasher soap. The objects are moved randomly in the task space. Bottom row shows the trajectories followed by both these objects respectively

Fig. 6
figure 6

Grasp adaptation experiments: the test objects are segmented out from the background and the initial grasp set is chosen based on that. Virtual models of the robot, hand and object are shown superimposed over the observed camera images of the scene. Red points show the segmented selected object depth cloud (first column). As the objects are moved, the grasp set is updated accordingly to include applicable candidates (middle three columns). Gray frames indicate all the possible grasp candidates, red frame represents the current grasp followed by the robot and blue frame points to the next best possible candidate. The point cloud inside the blue bounding box illustrates the tracked object. Successful grasps are shown in the last column. Provided supplementary video illustrates the results in more detail. The trial with bar failed during rotation, where the visual tracker failed to track the object (can be seen in middle two columns of last row) (Color figure online)

Fig. 7
figure 7

Analysis of planner switching while handling dishwash. Red curve represents currently executed grasp trajectory error and green curve shows previous error. Selected offset is shown with dotted line. As explained with Eq. (10), a trajectory becomes current grasp trajectory, when its error is less than previous plus offset. The points where this switching occurred are pointed with diamonds and corresponding robot locations are shown in the images beside (Color figure online)

4.3 Demonstration of the online grasp planning

We conducted experiments to evaluate our grasp adaptation approach, where a human moves the test objects while they are being tracked. Figure 6 illustrates the steps during these experiments. Provided supplementary video illustrates the reported results in more detail. Weighting coefficient a (from Eq. (8)) and switching threshold \(\delta _c\) (from Eq. (11)) are chosen to be 0.4 and 0.2, respectively for these tests. In the next section, we also analyse the performance with different values for \(\delta _c\).

Fig. 8
figure 8

Analysis of planner switching while handling coffee box

Table 1 Statistical results for repeated trials

Firstly the model point cloud of the object is obtained as explained in Sect. 2.2 and is used for building the set of initial grasp hypotheses. It is worth noting that in case of multiple objects presented in the scene, the query object, i.e., the object to be followed and grasped has been selected manually by a human operator using our software interface. Whereas in case of single object, the human operator directly allows the algorithm to proceed with next steps. Out of the generated hypothesis, the first 20 are selected for these tests. In the next section, we analyse the effect of the amount of these hypothesis. Once the visual tracking starts, the selected grasps are adapted based on changed object poses so that they are still applicable according to the current object pose. The grasp set includes the grasps that correspond to the lowest error from step 14 in Algorithm 1 and the set is shown with gray wire-frames in Fig. 6. The transparent blue bounding box corresponds to the region where the grasps should be applicable. The red wire-frame corresponds to the best feasible grasp based on the error (given by Eq. (9)) and the blue one shows the next best feasible configuration. The last column in the figure shows example grasps that were executed during the trials. From our experiments, we found out that one of the used test objects: bar (labeled 3 in Fig. 3) is not suitable for the experiments. Results with bar are shown in the last row of Fig. 6, where it can seen that the visual tacking failed during rotation due to lack of distinguishable depth textures and symmetry of the object, leading to wrong grasp poses. Since no tracker re-initialisation is used, we consider this case as a failure.

The switching process while executing the current grasp trajectory, i.e., while the arm follows the approach waypoint, has been visually illustrated for two different objects in Figs. 7 and 8. It can be observed that once large errors are present, the global planner is triggered to reduce the error immediately in each trial, which clearly validates our approach. This way the grasp set is updated with applicable grasps. It is worth mentioning that the used IK solver from Sect. 3.3.1 is already integrated with a collision expert due to which no self-collisions are noticed during the trials.

We repeated the experiments ten times for each test object, where a human arbitrarily moved the object in the robot workspace for about half a minute on average. Table 1 summarizes the results for the first three trials from these experiments, listing the number of switches between local and grasp planners and the success rates (for all ten trials). The resulting number of switches between the planners demonstrate the re-planning capability of the proposed approach. One trial with can failed due to visual tracking failure caused by self-occlusion and two experiments with blue jug failed as the rotation of the object caused the handle to appear again after it became hidden from the camera which led to wrong grasp poses as the object pose was inverted. It can be also noted that the number of switchings for lego toy are comparatively high due to its complexity in surface model. Overall, these results show that once the initial grasps are chosen, the proposed approach can find feasible configurations in the work-space given that the objects can be visually tracked continuously. Detailed results can also be found at https://youtu.be/kzG-TxT4wd8.

Fig. 9
figure 9

a Tracks used by the human operator to move the object. The objects are rotated along the black line and are translated along the metallic scale (30 cm both forward and backward). bd Illustration of the number of selected hypotheses on dishwash object. b First 10, c first 20 and d first 30

4.4 Analysing the effect of selected grasp hypotheses

In this section, we report the experiments performed to analyse the effect of the number of grasp hypotheses used for following and grasping moving objects. The tests are conducted in a semi-controlled environment, where a human moves the objects in a systematic way along the two tracks shown in Fig. 9. The objects are rotated along the black line and are translated along the metallic scale (30 cm both forward and backward). For the following tests, three different amounts of hypotheses: first 10, first 20 and first 30 (out of the total generated hypotheses), as shown in Fig. 9b–d are evaluated. Three different test objects are selected for these experiments based on the experimental performance achieved in the previous tests and also on the shape complexity: big can (labelled 6 in Fig. 3-symmetric shape), dishwash (labelled 1 in Fig. 3-moderately complex shape) and lego toy (labelled 7 in Fig. 3-highly complex shape). On an average, each trial took 3 min and 10 s. The values of a and \(\delta _c\) remain same as the previous experiments. Table 2 summarises the obtained results. From the results, an important conclusion can drawn, which is with an increase in the number of grasp hypotheses the number of switches increases, i.e., the system is more flexible in tracking (following) a moving object. Also, it can be seen that the average task space error reduces with increase in the grasp hypotheses, i.e., the system try to track the object as closely and smoothly as possible by continuous replanning. Although the system performance is convincing, we believe the requirement of automatic best hypotheses selection from the available ones, which we plan to integrate in future.

4.5 Analysing the effect of switching threshold \(\delta _c\)

In this section, various experiments are performed to analyse the effect of switching threshold on the task performance. Similar to the previous, these tests are performed in a semi-controlled manner and same test objects are used. The number of grasp hypotheses are fixed to be first 20 and a value of 0.4 is used. For analysis, we conduct the experiments using four different thresholds: 0.2, 0.4, 0.6 and 0.8. The obtained results are summarised in Table 3. From the results, it can be clearly seen that the system is more dynamic with a smaller threshold. However, an interesting fact is with the task space error, which do not follow any trend. With this results, two conclusions can be derived. On one hand, the task space error is more dependent on the number of grasp hypotheses provided, which is also true from the previous results. On the other hand, due to the availability of ample amount of grasp hypotheses (20 in this case), the system tries to follow the object locally, keeping the error low.

Table 2 Analysis of the selected grasp hypotheses

5 Conclusions

This paper presented a novel approach for adaptive grasping and trajectory planning, which enables a robot arm with hand to dynamically track and grasp objects moving in arbitrary 6DoF trajectories. For a moving object whose pose is continuously updated by a particle filter-based 3D visual object tracker, the proposed method evaluates online a set of pre-generated grasp hypotheses (provided by a learned generative grasp planner) enabling the robot to maintain a suitable pre-grasp pose with minimum task space error while following the object (throughout its motion). The task space error is incorporated into the robot’s motion planning through an iterative numerical optimization-based IK solver. The motion planning associated with the proposed method is comprised of dynamic switching between local and global planners, which enables the robot to track the object as closely and smoothly as possible by continuous re-planning.

Table 3 Analysis of the switching thresholds

The proposed method has been demonstrated in an application, in which a human move an object in arbitrary ways before handing it over to the robot, as might be necessary for collaborative human–robot co-working. The first set of experiments are conducted with a number of different objects of diverse shapes, moved along arbitrary 6DoF trajectories to analyse the performance in terms of repeatability. Obtained results suggest that the method is highly repeatable in tracking and grasping the objects’ given that the objects pose can be visually tracked. They also demonstrate how the grasps are adapted by dynamically switching between local and global trajectory planners for objects with variety of shapes. The second set of experiments are conducted at a semi-controlled environment to analyse the effect of various parameters on the proposed approach. The results obtained from these tests suggest that the system demonstrates smooth performance with an increase in the initially selected grasp hypotheses or by using small switching threshold. Altogether, these results clearly indicate the robustness of our proposed online grasp planning method.

Our ongoing work is directed towards increasing the speed of the visual tracking by using a GPU implementation. Additionally, there are several possible directions for future research, which include integrating a gaze controller (Marturi et al. 2015) with tracking for hand-eye coordination, improving the switching approach between local and global trajectory planners, and improving the detection and selection of candidate grasp poses. We will also consider alternative applications, such as grasping stationary objects from robot arms mounted on moving vehicles.