Autonomous Robots

, Volume 36, Issue 4, pp 309–330

Stable grasping under pose uncertainty using tactile feedback

Article

DOI: 10.1007/s10514-013-9355-y

Cite this article as:
Dang, H. & Allen, P.K. Auton Robot (2014) 36: 309. doi:10.1007/s10514-013-9355-y

Abstract

This paper deals with the problem of stable grasping under pose uncertainty. Our method utilizes tactile sensing data to estimate grasp stability and make necessary hand adjustments after an initial grasp is established. We first discuss a learning approach to estimating grasp stability based on tactile sensing data. This estimator can be used as an indicator to the stability of the current grasp during a grasping procedure. We then present a tactile experience based hand adjustment algorithm to synthesize a hand adjustment and optimize the hand pose to achieve a stable grasp. Experiments show that our method improves the grasping performance under pose uncertainty.

Keywords

Grasping Uncertainty Robustness  Tactile sensing 

1 Introduction

Robust grasping is one of the most important capabilities a robot is expected to have. Successful robotic grasping establishes the first step for a robot to physically interact with its environment and accomplish other higher level object manipulation tasks.

To enable robotic grasping, one of the existing approaches is to decompose a grasping procedure into two main stages: planning and execution (Popovic et al. 2010; Saxena et al. 2008; Berenson and Srinivasa 2008; Goldfeder and Allen 2011). The planning stage is usually done in simulation with the 3D information extracted from a perception system. A stable grasp parameterized by the hand posture and hand-object relative pose is then synthesized. In the execution stage, the planned grasp is sent to a path planner to generate a collision-free trajectory and the robot moves along the newly generated trajectory to the grasping pose. These methods usually use geometrical models of the objects to be grasped in the planning stage. However, since grasp planning is done in a simulation world which is not an exact model of the actual workspace due to imperfect perception and robot calibration, the executed grasps can end up unstable and these methods are sensitive to pose uncertainty. Figure 1 shows two examples where stable grasps are not achieved by simply going through the planning and execution stages. Due to pose uncertainty, an executed grasp can end up perturbing the object, pushing away the object, or even knocking the object off. None of these situations are preferable for a robotic grasping task.
Fig. 1

Execution of planned-to-be stable grasps. a, d Two grasps planned in simulation that are stable. b, e Snapshots of successful execution of the two planned grasps which were able to lift up the object stably. c, f Two failure cases of execution due to pose error where the executed grasps were not able to lift up the object

Another approach is to treat grasping as a control problem where a set of control laws are applied to adjust the hand to achieve some preferred contact configuration on the object, e.g., antipodal grasps (Jia 2000; Platt 2007; Wang et al. 2007). These methods usually utilize the actual sensing data from force, torque, or tactile sensors; so they do not require any specific hand-object relative pose and are more robust under pose uncertainty. Methods along this direction are also usually object model free. Since the control laws are relatively computationally inexpensive, these methods run fast. However, a major issue is that these methods either ignore the hand kinematics or assume relatively simple hand designs, such as parallel jaw grippers and their simple variants. So, it is difficult to extend these methods to complex hand designs which have more dexterity in object manipulation tasks.

Both approaches have their own benefits as well as disadvantages. In our work, we attempt to unify the power of the two different categories of grasping methods by developing a grasping pipeline starting in a planning-based style but with a closed-loop hand adjustment procedure after grasp execution. Specifically, we use a planning-based approach to establish an initial grasping pose on a known object and then switch to a control-styled method to adjust the hand locally on the object and optimize the grasp configuration. The intuition behind this design is two-fold. First, vision is capable of providing global geometrical information which is accurate enough to generate an initial grasping pose. Second, when the hand is at the planned grasping pose around the object, vision may have difficulty in extracting accurate information concerning hand-object relative pose considering potential occlusion and long distance. At this point of time, tactile sensing data can start to play an important role in recovering the direct interaction between the object and the hand in real time. They provide important information concerning contact configuration which determines the wrench distribution of the grasp and thus the stability of the grasp. Tactile sensing data can also indicate the relative pose between the hand and the object, e.g., (Platt et al. 2011; Pezzementi et al. 2011; Petrovskaya et al. 2006). Using tactile sensing data, it is probable to make necessary hand adjustments to achieve stable grasping poses. Figure 2 outlines components of our grasping pipeline. Initially, a grasp is applied using modules perception and grasp planning and execution, which form a conventional planning-based grasping pipeline. Once the initial grasp is established, the stability of the grasp is estimated by the grasp stability estimation procedure. If the grasp is classified unstable, a hand adjustment will then be synthesized and applied in the hand adjustment procedure. In the following sections, we will discuss the components of Fig. 2 in detail. Part of this work has been published in (Dang and Allen 2012).
Fig. 2

A grasping pipeline with a regular planning-based grasp execution procedure and a post-execution grasp adjustment procedure including grasp stability estimation and hand adjustment. A typical planning-based grasping pipeline usually contains only the first two components perception and grasp planning and execution. Two thresholds \(t_1, t_2, 0<t_1 < t_2\) were used to evaluate the closeness between two grasps. We will discuss them in Sect. 5.2.2

2 Previous work

2.1 Planning-based grasping

Planning-based grasping pipelines are probably the most widely used framework in the robotics community (Saxena et al. 2008; Popovic et al. 2010; Berenson and Srinivasa 2008; Goldfeder and Allen 2011). Some planning-based algorithms require the object geometry to be known as a prior. Ciocarlie et al. proposed the Eigengrasp idea for grasp planning using an articulated hand and an object model (Ciocarlie and Allen 2009; Ciocarlie et al. 2007a). This method effectively reduces the dimension of the search space for stable grasps and results in a faster search process to find force-closure grasps. Berenson and Srinivasa (2008) proposed a method to generate collision-free force-closure grasps for dexterous hands in cluttered environments. Przybylski et al. (2011) introduced a method to use medial axis transform as an object representation for grasp planning. Roa et al. (2012) designed an algorithm to compute power grasps for hands with kinematic structure similar to human hands. Miller et al. (2003) used shape primitives to represent objects and plan stable grasps with different rules associated with shape primitives. Goldfeder et al. (2007) extended this idea and proposed a grasp planning method using a shape decomposition tree of an object. Along the same direction, Huebner and Kragic (2008) approximated 3D objects with box primitives and planned grasps using the box-based shape approximation.

When full object geometry cannot be obtained in advance, another group of algorithms can be used for grasp planning, which require only partial knowledge of the object geometry. Saxena et al. (2007, 2008) used synthesized image data to train a classifier to predict grasping points based on image features such as edges, textures, and colors. Similarly, Bohg and Kragic (2010) used shape context feature from synthesized 2D images to learn grasping points. In addition to 2D images, some variant methods along similar directions exploit 3D range data to generate grasp candidates (Le et al. 2010; Jiang et al. 2011; Klingbeil et al. 2011; Rao et al. 2010). Goldfeder et al. (2009a, b) built a database of grasps on different shapes and developed a grasping pipeline that utilizes partial data to register range data into shapes in the database and synthesize grasp candidates. Popovic et al. (2010) proposed a method to execute stable grasping on unknown objects based on co-planarity and color information. A similar approach is taken by Kootstra et al. (2012) who use edge and texture information from the scene to generate grasp candidates. Boularias et al. (2011) used Markov Random Fields to learn grasping points for similar objects. El-Khoury and Sahbani used Gaussian curvature as an indicator of separation points to segment point clouds and approximated the segments with super-quadratic primitive shapes. A neural network was then trained to learn to select appropriate segments for grasping (El-Khoury and Sahbani 2010). Geidenstam et al. (2009) approximated 3D shapes with bounding boxes and trained a neural network to learn stable grasps based on the box representation. Horowitz and Burdick (2012) considered grasp and manipulation planning together as a trajectory optimization problem to solve.

2.2 Control-styled grasping

Antipodal grasps and their variants are a type of grasps many control-styled methods try to achieve. Jia (2000) used tactile sensing to locate the contacts while rolling the fingers on an object and apply a grasp when two antipodal contacts are achieved. López-Coronado et al. (2002) applied a neural model to learn a mapping from tactile sensing data to motor control and used this mapping to center an object with respect to a parallel jaw gripper during grasping. Hsiao et al. (2010) developed a reactive algorithm based on tactile and force sensor data to locally adjust the pose of a parallel jaw gripper and grasp objects. Wang et al. (2007) proposed a control algorithm which uses the force and torque information to drive the search process for stable grasps. Also on achieving antipodal grasps but with a multi-fingered hand, Bierbaum and Rambow (2009) proposed a method to generate antipodal grasp affordances based on reconstructed faces of an object through tactile exploration. Platt (2007) proposed a method to learn grasping strategies based on contact relative motions and examined this idea in 2D planar grasping scenarios with a Robonaut hand.

In addition to antipodal grasps, Coelho introduced a controller which considers contact position and normal feedback to synthesize contact configurations for statically stable grasps that involve \(k\) contacts (Coelho and Grupen 1997). Mishra and Mishra (1994) analyzed grasping with a 2 or 3 fingered hand and developed a reactive algorithm to achieve ideal contact configurations on an object.

2.3 Grasping under uncertainty

There has been previous work on robust grasping under uncertainty. In order to plan stable grasps which display more robustness under uncertainties, Berenson et al. (2009) used the task space regions (TSR) framework to represent pose uncertainty for planning grasp candidates that are most possible to succeed. Brook et al. (2011) analyzed uncertainty in both object identity and object pose for planning the best grasping pose. Stulp et al. (2011) designed a framework to generate robust motion primitives by sampling the actual pose of the object from a distribution that represents the state estimation uncertainty. Similarly, Weisz and Allen (2012) proposed a new quality metric to measure the robustness of a grasp under object pose uncertainty. Kim et al. (2012) considered dynamic movements of the object being manipulated during grasp planning to generate optimal grasp candidates. Along the same direction, Dogar and Srinivasa (2011) analyzed push-grasping to deal with environmental uncertainties in cluttered scenes.

In addition to dealing with uncertainty in the grasp planning stage, researchers have been considering grasping as a reactive procedure and using tactile sensing as sensory feedback to improve grasping performance in the execution stage. Platt et al. (2010) proposed three variations on null-space grasp control which combine multiple grasp objectives to improve a grasp in unstructured environments. Felip et al. (2013) proposed a paradigm for modeling and executing reactive manipulation actions, which makes knowledge transfer to different embodiments possible while retaining the reactive capabilities of each embodiment. Bekiroglu et al. (2011) used HMM to estimate grasp stability from a series of tactile data. Based on this work, Laaksonen et al. (2012) proposed a framework to use on-line sensory information to refine object pose and modify the grasp accordingly. Hsiao et al. (2011) used tactile sensing data to estimate hand-object relative pose for synthesizing the next hand trajectory so that a specific grasp can be achieved. Morales et al. (2007) used tactile data to cope with uncertainty for the execution of a manipulation task. Zhang and Trinkle (2012) utilized both vision and tactile data to improve object tracking for grasping. Hebert et al. (2012) combined tactile data with other sensory data to provide object tracking for both grasping and manipulation under uncertainty. Nikandrova et al. (2012) proposed a probabilistic framework to use on-line sensory information for grasp planning. Jiang and Smith (2012) introduced seashell effect pre-touch sensing to use proximity sensing data for grasp control and surface reconstruction. With a reconstructed surface, a general grasp planning algorithm could take place to generate stable grasp hypothesis for execution.

In our work, we also consider grasping as a reactive procedure as illustrated in Fig. 2. We train our grasp stability estimator using a set of simulated stable grasps. This approach is similar to the previous work by Bekiroglu et al. (2011), while we use a different feature which focuses on encoding the distribution of grasp contacts. We exploit a simulation technique to generate a set of stable grasps from which hand adjustments can be synthesized. This approach differentiates us from previous work by Platt et al. (2010) where control rules are explicitly formulated. This is also different from the previous work by Hsiao et al. (2011) and Laaksonen et al. (2012) where hand adjustments are calculated based on analyzing the pose error of the object via tactile feedback. In the hand adjustment procedure of our pipeline, we try to avoid introducing disturbance to the object during a grasping process. This consideration comes from a different perspective compared to the work by Dogar and Srinivasa (2011) which analyzed the push action that intentionally moves the object into the palm to form a stable caging grasp.

3 Grasp stability estimation

We now describe the grasp stability estimation procedure of Fig. 2. When a grasp is established on an object, tactile sensors capture the valid contacts of the current grasp. This information provides us a way to infer the stability of the grasp. This section discusses our learning method that uses tactile sensing data to predict the stability of a grasp.

3.1 Extract tactile contacts

When visual information is not available, tactile feedback from the hand is crucial in object grasping and manipulation tasks. It gives us information about the object’s local geometry which is difficult to obtain through vision alone. Tactile sensors play an important role in representing the contacts between the surface of the hand and the object that are touching each other. Tactile feedback from tactile sensors indicates which sensor cells have contacts with the object and which do not. It also provides intensity values that represent the forces sensed at these activated sensors. With the angle values for the joints of the hand, we can also use forward kinematics to determine both the location and the orientation of each sensor cell. So, we can utilize the tactile feedback to approximate the contact locations and orientations.

To represent the location and the orientation of a sensor cell, we want to use a coordinate system that is local to the hand and is consistent across different grasps. We choose the coordinate system attached to the palm as the reference coordinate system. Given a set of \(n\) joint angles of a grasp, \(\mathcal{J } = [j_1, j_2,\ldots j_n]\), we write out the location and the orientation of the \(k^{th}\) sensor cell on the \(i^{th}\) link in a homogeneous transformation matrix as follows:
$$\begin{aligned} T_{palm}^{sensor_{ik}}(\mathcal{J })=T_{palm}^{link_i}(\mathcal{J }) \cdot T_{link_i}^{sensor_k} \end{aligned}$$
(1)
where \(T_{palm}^{link_i}\) denotes the transformation between the link \(i\) and the palm; it is determined by the joint values \(\mathcal{J }\) and the hand kinematics for each grasp; \(T_{link_i}^{sensor_k}\) is the transformation between the \(i^{th}\) link and the \(k^{th}\) sensor cell on this link; it is determined by the sensor cell configuration and is a constant for every grasp. In the end, we can rewrite the matrix \(T_{palm}^{sensor_{ik}}\) in the form of \(c_i = <p \in R^3, o \in S^3>\), where \(p\) specifies the 3-D position and \(o\) is a quaternion to specify the orientation.

Using the location and the orientation of each sensor cell that is activated due to a contact, we can determine the configuration of the contacts involved in a grasp. It is worth noting that there is error in representing the actual contact locations and orientations using this method because each sensor cell has finite dimensions and any contacts residing within the same sensor cell will be indistinguishable.

3.2 Compute grasp features: a bag-of-words model

Bag-of-words models are widely used in the field of natural language processing (NLP) (Harris 1970). They are also known as bag-of-features models in the field of computer vision. In the field of NLP, bag-of-words models use a dictionary to represent a document without considering the order of the appearance of the words in the document. In the field of computer vision, an image is treated similarly as a document, where the visual features of an image take the role of the words in a document.

By the same analogy, we can transfer this idea to the context of robotic grasping. A grasp contains a set of contacts just as a document consists of a number of words. If we treat a grasp as a document and a contact as a word, it is reasonable to use a bag-of-words model to describe a grasp in a similar way as a bag-of-words model does a document.

3.2.1 A contact dictionary

In order to use the bag-of-words approach, we need to build a contact dictionary which represents the space of the potential contacts. We write it mathematically as a set of contacts \(\hat{\mathcal{C }}=[{\hat{c}}_1, {\hat{c}}_2, \ldots {\hat{c}}_p]\). It is impractical, if not impossible, to collect all the possible contacts that can appear in a grasp. Thus, we need a reasonable discretization of the space within the hand’s coordinate system. Considering the kinematics of a robotic hand, we see there are some regions within the hand’s local coordinate system that have larger potential than other regions for a contact to appear. Thus, using a set of representative contacts from these regions as a dictionary, we can enjoy both the statistically sound capability of representation of the contact space and a low dimensionality of the dictionary which determines the dimensionality of the features of a grasp.

In Sect. 5.1.2, we will describe how we learn a set of representative contacts using a clustering algorithm from a set of simulated grasps on commonly seen objects. Figure 3 illustrates an exemplar contact dictionary. The representative contacts in the contact dictionary are the cluster centers overlayed on a Barrett hand, which is a widely used robotic hand. The spread angle between finger 1 and 2 of the Barrett hand is set manually solely for giving a better idea of the hand’s work space. The centers of the clusters outline the reaching space of each finger. In Fig. 3a, the contact spaces of finger 1 and 2 of the Barrett hand display nice symmetry. This agrees with the symmetric mechanical design of the two fingers.
Fig. 3

Cluster centers of contacts overlayed on a Barrett hand, which is a widely used robotic hand. Spheres are located at the centers of each cluster. The clusters contain 199,835 contacts collected from a training set of 24,640 grasps, which will be discussed in Sect. 5.1.2

3.2.2 Grasp feature vectors

The set of cluster centers models the space of the contacts on the fingers and the palm in a highly discretized dimension. With this set of cluster centers, we use the distribution of the contacts among the cluster centers as feature vectors for grasps.

Given a contact dictionary which has \(p\) cluster centers \(\hat{\mathcal{C }}=[{\hat{c}}_1, {\hat{c}}_2, \ldots {\hat{c}}_p]\) and a grasp \(\mathcal{G }\) which consists of \(q\) contacts \(\mathcal{C }_\mathcal{G }=[c_1, c_2, \ldots c_q]\), we calculate the distribution vector of the contacts of grasp \(\mathcal{G }\) with respect to \(\hat{\mathcal{C }}\) as follows:
$$\begin{aligned} \mathcal{D }(\mathcal{C }_\mathcal{G }, \hat{\mathcal{C) }} = \sum _{i = 1}^{q}\mathcal H (c_i,\hat{\mathcal{C }}) \cdot \frac{f_{c_i}}{S_{c_i}} \end{aligned}$$
(2)
where \(f_{c_i}\) is the force value sensed at the tactile sensor cell corresponding to contact \(c_{i},\, S_{c_i}\) is the total amount of forces sensed from all the sensors cells on the sensor pad which contact \(c_i\) is on, \(\mathcal H (c_i,\hat{\mathcal{C }}) = [h_1, h_2, \ldots h_p]\) is a \(p\)-dimensional vector that stores the similarity values between contact \(c_i\) and each cluster center in \(\hat{\mathcal{C }}\). It is computed as:
$$\begin{aligned} h_{i} = \exp \left( {-\frac{||c_i - {\hat{c}}_i||^2}{\sigma ^2}}\right) \end{aligned}$$
(3)
where \(c_i\) and \({\hat{c}}_i\) are both 3-dimensional vectors storing the contact locations and \(\sigma \) is a parameter set manually. \(h_i\) measures the similarity between two contact locations. For a contact that is far from a cluster center, the corresponding Euclidean distance is large resulting in a small similarity value. The parameter \(\sigma \) controls the rate the similarity values decrease as the distances go up.
Distribution vectors \(\mathcal{D }(\mathcal{C }_\mathcal{G }, \hat{\mathcal{C) }}\) are built from the summation of the distributions of different numbers of contacts. Thus, we normalize the distribution vector by scaling it down using the number of sensor pads which have tactile contacts. Mathematically, for a grasp \(\mathcal{G }\), the normalized distribution vector is calculated as:
$$\begin{aligned} \hat{\mathcal{D }}_i = \frac{\mathcal{D }_i}{|\mathcal{P }_\mathcal{G }|} \end{aligned}$$
(4)
where \(\mathcal D _i\) denotes the \(i^{th}\) element of the vector \(\mathcal{D }(\mathcal{G }, \mathcal{C })\) and \(|\mathcal{P }_\mathcal{G }|\) is the number of sensor pads with tactile contacts. This rescaled distribution vector is the final feature vector we use to describe a grasp in our work.

3.3 Learning grasp stability

We take a supervised learning approach to the problem of grasp stability estimation. Section 5.1 describes how in practice we obtain a training set of grasps, label each training sample according to standard grasp quality metrics, and train an SVM classifier. Theoretically, a training set of grasps contains \(N\) instance-label pairs \((x_i, y_i), i=1,\ldots ,N\) where \(x_i \in \mathcal R ^{p}\) is a grasp feature vector as we discussed in the previous section, \(y_i \in \{-1,1\}\) is a label specifying whether this grasp is stable \((1)\) or unstable \((-1)\), and \(N\) is the number of training samples. Given this training data, an SVM can be trained and stored in advance. Using a trained SVM, we can classify a grasp and predict its stability. More details about SVM can be found in previous work by Cortes and Vapnik (1995) and Suykens et al. (2010).

4 Hand adjustment

We now describe the hand adjustment procedure of Fig. 2. The question this section attempts to answer is: Given a grasp, which is classified as unstable by the grasp stability estimator, what hand adjustments should the robot make to achieve a stable grasp? Figure 4 is one example of hand adjustment on a Snapple bottle to illustrate how hand adjustments help achieve stable grasps. The hand starts at a grasp where the hand barely touches one side of the Snapple bottle, thus failing to establish opposing contacts. In addition, the palm is not aligned with the vertical direction of the Snapple bottle, resulting in contact surfaces with very limited area. Using a vision system at a distance, this situation is difficult to detect since the pose offset is subtle. However, with tactile sensing, the relative hand pose difference is captured well. After two steps of hand adjustment, the grasp is adjusted such that the contacts are opposing each other and the contact surface is increased.
Fig. 4

Hand adjustment with tactile experience, an example that illustrates the progression of hand adjustment. Initially, the grasp (left column) barely touches one side of the bottle and the finger surface does not align well of the surface of the bottle. After two hand adjustments, the final grasp (right column) has opposing contacts and the finger surface aligns with the surface of the bottle

A hand adjustment specifies the changes to the current grasp. It consists of changes in hand location, orientation, and the selected degrees of freedom (DOF) to control.1 We can write it compactly as
$$\begin{aligned} Adj = <p, o, s> \end{aligned}$$
(5)
where \(p \in R^3\) is a 3-D vector specifying the new hand position in the current hand coordinate system, \(o \in S^3\) is the new hand orientation in the current hand coordinate system represented as a quaternion, and \(s \in R^{|S_{dof}|}\) is a vector storing value changes for the set of selected DOFs, \(S_{dof}\), which we want to control in a hand adjustment.

As illustrated in Fig. 2, to achieve reasonable hand adjustments, we first compute a tactile experience database which consists of a set of stable grasps and use these grasps as a reference to synthesize a hand adjustment. The tactile contacts extracted using forward kinematics and tactile sensor readings are used in querying the tactile experience database for stable grasps with similar tactile contacts. If the stable grasps with similar tactile contacts are successfully retrieved, hand adjustment parameters are synthesized and sent to control the hand to make local movements. If there is no similar tactile experience in the database, the local surfaces of the object at contact are reconstructed by moving the hand around to collect tactile contacts on the surface and stable grasps are planned based on the reconstructed local geometry.

4.1 Tactile experience database

A tactile experience database consists of stable grasps and their corresponding tactile contacts. It provides precomputed knowledge about the potential tactile contacts a stable grasp should contain. A grasp, \(\mathcal{G }\), in the tactile experience database can be considered as \(\mathcal{G } = \{\mathcal{P }, \mathcal{J }, \mathcal{T }, \mathcal{C }, \mathcal{L }\}\) where
  • \(\mathcal{P } = <p, o>, p \in R^3, o \in S^3\) specifies the hand pose in the object coordinate system, including the position and orientation of the hand. The orientation is represented using quaternions.

  • \(\mathcal{J } = \{j_1, j_2, \ldots , j_N\}, j_i \in R\) records the \(N\) joint angles of the grasp. As an example, for a Barrett hand, we can choose \(N=7\) and record the 7 joint values.

  • \(\mathcal{T } = \{t_1, t_2, \ldots , t_K, t_i \in R\}\) is the \(K\) tactile sensor readings. As an example, for a Barrett hand, there are 24 tactile sensors on each fingertip and the palm. Since it has three fingers and one palm, \(K = 96\).

  • \(\mathcal{C } = \{c_1, c_2, \ldots , c_M\}, c_i = <p_i, o_i>, p_i \in R^3, o_i \in S^3\) is the set of tactile contacts, indicating the locations, \(p_i\), and the orientations, \(o_i\) of the \(M\) activated tactile sensors.

  • \(\mathcal{L } = \{\mathcal{G }^l_i | \mathcal{G }^l_i = \{Adj,\mathcal{J },\mathcal{T },\mathcal{C }\}\}\) is the local tactile experience which stores related information for perturbed grasps within the neighborhood of grasp \(\mathcal{G }\). Local experience can be used to better locate a grasp within the neighborhood of the corresponding stable grasp based on which the local experience is generated. \(Adj\) stores the inverse of the perturbation from the stable grasp to a perturbed grasp. Using this transformation \(Adj\), we can adjust a perturbed grasp to achieve the corresponding stable one.

Section 5.2.2 describes how we compute this database on commonly grasped objects.

4.2 Query for stable grasps with similar tactile contacts

Once the set of tactile contacts are extracted from an actual grasp using forward kinematics, we query the tactile experience database for stable grasps that share similar tactile contacts. To this end, we define a distance function which measures the similarity between two grasps \(\mathcal{G }_1\) and \(\mathcal{G }_2\). This distance function considers both the tactile contact configuration and the hand posture between two grasps. In our work, we only use the location of a contact in the distance metric. The distance metric can be expressed as
$$\begin{aligned} dist(\mathcal{G }_1, \mathcal{G }_2)&= \frac{1}{2} \cdot \sum _{m=1}^{N_1}\min _n{\left( ||c^1_m - c^2_n||\right) }\nonumber \\&\quad +\, \frac{1}{2} \cdot \sum _{m=1}^{N_2}\min _n{\left( ||c^2_m \!- c^1_n||\!\right) } +\alpha ||js_1 \!-\! js_2||\nonumber \\ \end{aligned}$$
(6)
where \(c^i_m\) is the \(m^{th}\) contact of the grasp \(i,\,N_i\) is the number of contacts of grasp \(i\), and \(js_i\) is the joint values for the selected DOFs of the grasp \(i\). \(\alpha \) is a scaling factor for the Euclidean distance between the joint angles of the selected DOFs. The first two parts of the right side of the equation measure the Euclidean distance between the two sets of contacts in terms of their positions. The third part measures the difference between the joint angles for the selected DOFs. We also apply this function to measure the distance between a local tactile experience entry \(\mathcal{G }^l_i\) and a grasp \(\mathcal{G }\) using \(dist(\mathcal{G }^l_i, \mathcal{G })\) where the values of \(\mathcal{G }_1\) in Eq. 6 come from the grasp of \(\mathcal{G }^l_i\).

With this distance function, we query the tactile experience database for \(k\) nearest neighbors of an actual grasp using its tactile contacts. We also use this distance metric to decide whether there is any similar experience found in the database and whether an actual grasp is close enough to a stable grasp in the database, which correspond to the two decision diamonds in the hand adjustment procedure of Fig. 2. We describe the distance thresholds for both decision diamonds in Sect. 5.2.2.

4.3 Compute hand adjustment from experience

All the \(k\) nearest neighbors are stable grasps and they share similar tactile contacts with the actual grasp. In this case, it is reasonable to assume that the local geometry is similar where the contacts are established. Although the actual grasp shares similar tactile contacts with stable grasps, it is not close enough to be a stable one. However, it is possible that this grasp is away from a stable grasp by a small offset transformation. The goal of this step is to synthesize this offset transformation and generate a hand adjustment to optimize the grasp towards a stable one.

Algorithm 1 outlines the overall procedure to search for a hand adjustment command in Fig. 2 using tactile experience. The idea in this algorithm is to use the tactile experience to locate the actual grasp around each of the \(k\) nearest neighbors (stable grasps) and synthesize a hand adjustment based on the offset transformations from them. The first step of this algorithm is to look into the tactile experience database and locate the top \(k\) stable grasps that share similar tactile feedback (Line 1). Since the actual grasp shares similar tactile feedback as these \(k\) stable grasps, the actual grasp is probable to be within a small neighborhood of some of these stable grasps. From Line 4 to 9, we look into the neighborhood of each of the \(k\) stable grasps and try to evaluate how well the actual grasp can be located within the neighborhood of each stable grasp using the distance function as in Eq. 6. The refined search within the neighborhood of each stable grasp provides detailed relative information of the actual grasp with respect to the stable grasp. In Line 10, we decide the stable grasp within whose neighborhood we can best locate the actual grasp. Then the weighted mean of the offset transformations of the perturbed grasps within this neighborhood is calculated in Line 12 and is returned as the hand adjustment.

Tactile experience used in Algorithm 1 can be precomputed using a predefined list of perturbations. This process is described in Algorithm 2. The idea here is to sample different perturbed grasps around the stable one and record their tactile feedback. The predefined list of perturbations considers the potential pose error between the hand and the object. As a practical implementation, we first define an uncertainty space and then uniformly sample this space to generate a list of perturbations. In our work, we sample wrist orientation, wrist position and selected DOFs to generate these perturbed grasping poses. This sampling process provides the required data in Line 2 and 3. From Line 4 to 9, we perturb the hand according to each of the perturbations (Line 6) and record the tactile feedback and other related information (Line 7). After we have recorded the information of all the perturbed grasps, the local tactile experience is generated and returned.

Algorithm 3 describes how we compute a weighted transformation based on a list of perturbed grasps. The reason that we consider a weighted transformation is that every perturbed grasp carries useful information to synthesize a reasonable offset transformation and we want to take them all into account. In this algorithm, Line 2–10 first decides whether there exists a stable grasp within whose neighborhood we can better locate the actual grasp. If the actual grasp cannot be better located within the neighborhood of any of the \(k\) nearest neighbors, no hand adjustment can be synthesized at this step and an identity transformation will be returned (Line 8–10). Otherwise, the final weighted transformation is calculated in Line 11–17 as the weighted mean of the offset transformations from perturbed grasps as
$$\begin{aligned} \sum _{i}{\frac{weight_i}{\sum _{k}{weight_k}} \cdot Adj(\mathcal{G }^l_i)} \end{aligned}$$
(7)
where the weight, \(weight_i=1/dist(\mathcal{G }^l_i, \mathcal{G }_{x})\), is the inverse of the distance between a local tactile experience entry \(\mathcal{G }^l_i\) and the actual grasp \(\mathcal{G }_x\).

4.4 Explore local geometry

When the actual grasp is far away from any stable grasps in the tactile experience database, there will be no similar tactile experience found in the database. In this situation, a local geometry exploration will take place to reconstruct the local geometry around each of the contacts between the hand and the object. Sample points on the surface of the object are extracted from activated tactile sensors while the hand is moving within the neighborhood of the initial grasping pose. We will describe the exploratory path we used in Sect. 5.2.4. During the exploration, tactile contacts from tactile sensors on different links are treated as in different groups. As an example, for a Barrett hand, which has four links with sensors, there are up to four local geometries to reconstruct from collected point clouds. The coordinate system for local geometry reconstruction is established at the center of each local point cloud. The z-dimension for each local point cloud is aligned with the estimated normal of the surface and the other two axes aligned with the other two principal directions of the point cloud. It is assumed that a local geometry is smooth and can be represented using a quadratic function as follows
$$\begin{aligned} z = \alpha _{20}x^2 + \alpha _{11}xy + \alpha _{02}y^2 + \alpha _{10}x + \alpha _{01}y + \alpha _{00} \end{aligned}$$
(8)
Fitting the point cloud to the quadratic function above is an optimization process. We use levmar, an open source implementation of Levenberg-Marquardt nonlinear least squares algorithms in C/C++, to find the optimal parameters of the function (Lourakis 2004). With a set of optimal parameters, we can approximate the local geometry and synthesize a mesh for each contact.

4.5 Planning stable grasps on local geometry

With the local geometry being built as a mesh model, the Eigengrasp planner (Ciocarlie and Allen 2009) can be used to search around the current hand pose and plan stable grasps on this local geometry. The Eigengrasp planner is a stochastic grasp planner that searches for stable grasps in a low-dimensional hand posture space spanned by eigenvectors called Eigengrasps. As an example, for a Barrett hand which was used in our experiments, it has seven joints and four DOF. Two Eigengrasps \(E=<e_1, e_2>\) were used to describe the hand posture. One controls the spread angle and the other controls the finger flexion as illustrated in Fig. 5. The wrist pose is sampled locally around the initial grasping pose using a complete set of six parameters: \(P=<roll, pitch, yaw, x, y, z>\). These six parameters generates a hand offset transformation from the current hand pose. Thus, the search space for the Eigengrasp planner in our case is an eight-dimensional space \(S=\{E,P\}\). The stability of a grasp is evaluated using epsilon quality, \(\epsilon \), which measures the minimum magnitude of external disturbance to break the grasp (Ferrari and Canny 1992).
Fig. 5

Two Eigengrasps used to specify the hand posture of a Barrett hand, \(e_1\) and \(e_2\). \(e_1\) controls the spread angle between two fingers. \(e_2\) controls the finger flexion of all the three fingers

After the planning process is complete, a stable grasping pose is generated. Since the planning is done in the current hand coordinate system, a hand adjustment command in Fig. 2 can then be synthesized.

4.6 Apply hand adjustment

Once a hand adjustment command \(Adj^* = <p,o,s>\) is found, we need to apply this adjustment to the hand: change the hand pose and reshape the joints. We decompose this process into three steps.

First, the hand opens its fingers so that it lets the object go and backs up to have some safe margin between the palm and the object before the following movement.

Second, the selected DOFs change to the values specified by \(s\). The hand moves to a location 5 cm (subject to change for different hands) away from the goal position with the goal orientation \(o\).

Third, the hand moves in guarded mode towards the goal position. The hand will either reach the goal position or stop if it contacts anything before it reaches the goal.

The reason we decompose the movement into these three parts is that the adjustment \(Adj\) may end up with potential collision. So we want to first go to a safe place that is away from the goal location with the goal orientation and then approach the goal position using guarded motions.

5 Experimental results

We now describe the experiments of our work. We performed two sets of experiments: one is for the grasp stability estimation procedure of Fig. 2 and the other is for the entire pipeline of Fig. 2. In each set of experiment, we did experiments in both simulation and physical settings. The robotic hand we used was a Barrett hand, which is a four-DOF hand with seven joints. Each finger has one DOF but two coupled joints. This robotic hand is equipped with four tactile pads, one for the distal link of each finger and one for the palm, resulting in a 96-sensor system. In simulation experiments, we used a simulated version of this hand inside the GraspIt! simulator (Miller and Allen 2004).

5.1 Experimental results: grasp stability estimation

In this section, we show the experiments we did to test the grasp stability estimation procedure of Fig. 2. We first describe how we generated a set of training and test grasps and trained an SVM classifier. We then show experiments on estimating grasp stability of simulated grasps. In addition, we show experimental results on using the stability classifier in physical grasping scenarios.

5.1.1 Grasp dataset

Our grasp data is from the Columbia Grasp Database (CGDB) (Goldfeder et al. 2009a). This database contains hundreds of thousands of simulated grasps constructed from several robotic hands and thousands of object models. Since we used the Barrett hand in our experiments, we only chose grasps with this hand from the database. Object models used in the CGDB are from the Princeton Shape Benchmark (PSB) (Shilane et al. 2004). The PSB provides a repository of 3D models which span many objects that we encounter everyday. One fact about the PSB model set is that the models were not originally selected with an eye towards robotic grasping, and so some of the models are not obvious choices for grasping experiments. For example, the model set contains insects, which are often outside our everyday grasping range. So, instead of using the full set of grasps with the Barrett hand, we chose to select grasps with the Barrett hand computed on a smaller set of objects that are more frequently grasped and manipulated by us in our everyday life. In total, we collected about 36,960 robotic grasps from 704 objects across 19 different classes. Table 1 shows the object classes and the number of objects within each of these classes.
Table 1

Object classes and number of object models in each class

Skateboard

Book

Bottle

Butcher knife

20

16

48

16

Wine glass

Vase

Hammer

Handgun

36

88

16

80

Ice cream cone

Knife

Lamp

Microscope

48

28

56

20

Phone handle

Rifle

Screw driver

Wrench

16

76

20

16

Gear

Helmet

Mug

36

40

28

For each grasp, the tactile feedback was simulated in the GraspIt! simulator. The output of the tactile sensors around each contact is characterized by the forces applied at each sensor cell. So, a contact model that approximates the contact region and the pressure distribution is necessary for simulating a tactile feedback. Pezzementi et al. (2010) used a point spread function model to simulate the response of a tactile sensor system. In our work, we built our tactile simulation system based on a soft finger contact model by Ciocarlie et al. (2007b). This model is briefly summarized in Appendix. Interested readers can refer to the original papers for more details (Ciocarlie et al. 2007b; Dang et al. 2011).

Since each sensor pad on a Barrett hand is a flat plane, a stable grasp must have at least two sensor pads in contact with the object being grasped, resulting in at least two sensor pads with non-zero responses. So, we rejected grasps which have less than two sensor pads with non-zero responses. We then split the grasp dataset into two subsets, \(\frac{2}{3}\) of the grasps for training and \(\frac{1}{3}\) of the grasps for testing. These two subsets are disjoint and grasps in them are uniformly distributed across all the objects.

5.1.2 Building a contact dictionary

Grasps in the training dataset span a wide range of grasping poses and object shapes. It is reasonable to believe that the contacts from these grasps represent the space of potential contacts of a grasp statistically well. So, we used these grasps to build our contact dictionary. We first applied a K-means clustering algorithm (MacQueen 1967) to the contacts from this set of grasps to obtain a list of clusters. We then used the cluster centers as a set of representative contacts to form our contact dictionary. Considering the fact that the reaching space for each finger of a Barrett hand intersects with each other very rarely, it is safe to think each contact location is associated with a contact orientation and vice versa. Based on this consideration, we only used the location part of a contact in the clustering process. Thus, the space we were clustering was a regular 3-dimensional Cartesian space. The distance function we chose in the K-means clustering algorithm measures the Euclidean distance between two contact locations. In the K-means clustering algorithm, \(k\) directly controls the dimension of a feature vector. A large \(k\) will result in a dense sampling of the contact space while a relatively small \(k\) will yield a sparse sampling of the space. Experimentally, we chose \(k=64\) to create 64 different clusters. So, a feature vector of a grasp is 64-dimensional. Figure 3 shows these 64 cluster centers within a Barrett hand’s coordinate system.

5.1.3 Grasp quality measurements

Different measurements could be used to evaluate the quality of a grasp, such as stability, feasibility, and dexterity (Suárez et al. 2006). In our work, the grasp quality measurements generate the ground truth for the labels of training grasp samples. We chose two quality measurements related to the stability of a grasp: the epsilon quality and the volume quality (Miller and Allen 1999), which are based on the grasp wrench space (GWS) generated by the grasp. These two grasp quality measurements provide analytical numbers to distinguish stable grasps from unstable ones. In our experiments, we modeled the material of the surface of the hand as rubber and the material of the object as wood and set the friction coefficient between the finger and the object as 1.0.

A GWS is a 6-dimensional space which contains a set of possible resultant wrenches produced by the fingers on the object. A wrench is a 6-dimensional vector \([f^{1\times 3},\tau ^{1\times 3}]\) that describes the combination of the possible force and torque. In our work, a GWS is generated by assuming the sum of the normal forces applied at each contact is 1. This assumption approximates a limited power source for the hand (Suárez et al. 2006). In geometry, the volume quality measures the volume of the potential wrench space and the epsilon quality measures the radius of the largest ball centered at the origin of a GWS and fully contained in the GWS.

The epsilon quality, \(\epsilon \), refers to the minimum relative magnitude of the outside disturbances that could destroy the grasp. So, when we take into account the limit of the maximum forces a robotic hand can apply, a grasp would be less stable if it has a smaller epsilon quality. This is because the smaller epsilon quality indicates that a relatively smaller outside disturbance can break this grasp even when the robotic hand has already applied the maximum forces it supports. Another consideration is from the perspective of the environment uncertainty. Due to the uncertainty of the environment, objects may move away from their original position during a grasp execution. A fragile grasp may fail to fully grasp the object in this situation while a stable one may display its robustness and still succeed in grasping the object in the perturbation. We have experimentally found a certain correlation between this robustness and the epsilon quality: grasps with epsilon quality \(\epsilon > 0.07\) tend to be more robust in uncertain object perturbations (Weisz and Allen 2012).

The volume quality, \(v\), measures the volume of the potential wrench space generated by the grasp given unit contact normal forces. A grasp with a larger potential wrench space would require less forces at each contact than grasps with smaller potential wrench spaces. This indicates that the larger the volume quality is, the stronger the grasp could be.

5.1.4 Labeling grasps in the dataset

Given all the grasps in the grasp dataset, we plot their epsilon qualities and volume qualities in Fig. 6. We can observe that their epsilon qualities and volume qualities are not well correlated.
  • Grasps with high epsilon quality do not necessarily have good volume qualities.

  • Grasps with good volume qualities can still have low epsilon qualities.

In addition, Li and Sastry (1988) pointed out that the epsilon quality measure is not invariant to the choice of torque origin, so we used the volume quality as an invariant average case quality measure for the grasp. Either of these measures has its own benefits. It makes sense that we combine them together and form a new evaluation criterion. Based on our experimental results, we used thresholds \(t_{\epsilon } = 0.07\) and \(t_{v} = 0.1\) as the boundaries for epsilon and volume qualities to label a grasp \(grasp_i\) as a stable \((1)\) or an unstable \((-1)\) grasp as follows,
$$\begin{aligned} label(grasp_i) = \left\{ \begin{array}{rcl} -1 &{} \text{ if } &{} \epsilon _i \le t_{\epsilon }\; or\; v_i \le t_v\\ 1 &{} \text{ if } &{} \epsilon _{i} > t_{\epsilon }\; and\; v_i > t_v\\ \end{array} \right. \end{aligned}$$
(9)
where \(\epsilon _i\) denotes the epsilon quality of grasp \(grasp_i\) and \(v_i\) is the volume quality of grasp \(grasp_i\).
Fig. 6

Epsilon and volume qualities of grasps from a subset of objects in the CGDB. This figure shows that the volume quality and the epsilon quality of a grasp do not correlate with each other very well. This indicates that by combining these two grasp metrics, we can get a more comprehensive evaluation criterion

5.1.5 Grasp stability estimation on simulated grasps

To compute grasp features, we wanted to use a reasonable \(\sigma \) in Eq. 3 which would maximize the range of \(h_i\) values in \([0,1]\) and maximally distinguish different contacts. In our experiments, we analyzed the range of the contacts from all the grasps of a Barrett hand and experimentally set \(\sigma \) as 36.45. We used libsvm (Chang and Lin 2001) to train an SVM based on the training data which contains about 24,640 grasps and tested the SVM on the remaining 12,320 grasps. Table 2 summarizes the number of stable grasps and unstable grasps in both training and test datasets. Figure 7 shows the classification result in more detail. The overall accuracy across all the classes of objects is 81.0 %. Considering the context of a physical grasping process, the percentage of the false positive predictions is a crucial evaluation criterion. This is because that a false positive prediction will guide the robot to apply an unstable grasp and use it as if it is a stable one. In most working conditions, this action is very risky and even unacceptable. In Figure 7, we show the percentages of error and false positive predictions for each object class. The percentages of false positive predictions illustrate the probabilities of the situation when an unstable grasp is incorrectly classified as a stable one during a grasping task. For most of the object classes, the probabilities of false positive predictions are relatively low. The overall false positive prediction is 8.6 %.
Table 2

Learning performance in simulation

Dataset

Grasps

Stable

Unstable

Accuracy

Training

24,640

11,849

12,791

Test

12,320

5,914

6,406

81.0 %

Fig. 7

Accuracy analysis of estimating grasp stability on simulated grasps. Horizontal axis is the group names for each object class. Vertical axis is the percentage (%) of the overall false predictions (dark green bars), the false positive predictions (light green bars), and the false negative predictions (yellow bars) per object class. As shown in the graph above, the percentages for false positive predictions per object class (light green bars) are relatively low, which is necessary for blind grasping (Color figure online)

5.1.6 Using the grasp stability estimator in physical grasping

To evaluate the performance of the classifier trained with simulated data in physical grasping scenarios, we did some experiments with a Barrett hand on six everyday objects: a pencil cup, a mug, a candle box, a paper wipe box, a canteen, and a decorative rock as shown in Fig. 8a. Only the mug belongs to an object class that is included in the simulated training data. The pencil cup, the candle box, and the paper wipe box are objects to some extent similar to the bottle class in the training set. The canteen and the decorative rock are two objects that are very different from other objects in the training set.
Fig. 8

Physical experiments on using a grasp stability estimator for grasping. a The six objects in the experiments: a pencil cup, a mug, a candle box, a paper wipe box, a canteen, and a decorative rock. bd Three stable grasps on three different objects in the experiment

In an experiment, we placed an object at a predefined location on a table that was in front of the robot. The robot approached the predefined location with different spread angles from a direction chosen out of a list of predefined directions. When the robot contacted the object, it stopped approaching and closed the fingers. Tactile data and joint angles were then collected for grasp stability estimation using the classifier trained in Sect. 5.1.5. The arm lifted up the object once a stable grasp was perceived. A trial was considered to be a failure when the robot was not able to grasp the object stably, i.e., an object fell out of the hand when the robot tried to lift the object up.

110 trials were performed on six different objects, including a canteen of different weight and surface material, and a mug filled with different weights. Figure 8b–d show several snapshots of successful lift-ups. In Table 3, we summarize the experiment results. Overall, the success rate is 84.6 % across all the objects in our experiments. The canteen without its fabric cover and the decorative rock are two objects that are very slippery and difficult to grasp. Compared to other objects, there are much fewer stable grasps on the canteen without its fabric cover and the decorative rock. The decorative rock is convex and a very large proportion of the surface is facing upwards to some extent, thus frictional force becomes the main source of forces to overcome the gravity during grasping, making the decorative rock an even more difficult target. In addition, another source of difficulty comes from the geometrical complexity of the rock: the surface of the rock is rough but the tactile sensors have limited sensing resolution, thus it is difficult to distinguish grasps at nearby locations, which have different surface normals at contacts but generate the same tactile feedback.
Table 3

Experiment results on six objects

Object

Mass (kg)

# Of exp

Success

Rate (%)

Mug

0.43–0.93

30

28

93

Paper box

0.17

10

9

90

Pencil cup

0.09

10

9

90

Candle box

0.11

10

9

90

Rock

0.28

10

6

60

Canteen

0.5–0.75

40

32

80

Total

0.09–0.93

110

93

84.6

5.2 Experimental results: stable grasping with post-execution grasp adjustment

This section describes the experiments we did for the entire pipeline illustrated in Fig. 2. We will explain how we built a tactile experience database and how the pipeline improves grasping performance under assumed pose uncertainty.

5.2.1 Experimental setup

In our experiment, the selected DOFs \(s\) in a hand adjustment \(Adj=<p,o,s>\) controlled the spread angle of the Barrett hand. We chose five commonly seen objects as our test objects shown in Figs. 9 and 10: a Snapple bottle, a box, a detergent bottle, a cup, and a decorative rock. We assumed a table-top grasping scenario where the objects rest on a flat surface. In this situation, the pose error can be parameterized by \(<x,y,\theta >\) as illustrated in Fig. 11. In our experiments, we intentionally generated a list of pose error with an approximately uniform distribution over \(x \in [-30,30]\) in millimeter, \(y \in [-30,30]\) in millimeter, and \(\theta \in [-20,20]\) in degree. By injecting different pose errors into a stable grasping pose, we could perturb the stable grasp from its ideal grasping pose and generate grasping scenarios with different pose uncertainty.
Fig. 9

Object models used in the simulation experiments: a Snapple bottle, a box, a detergent bottle, a cup, and a decorative rock

Fig. 10

Objects used in the physical experiments: a Snapple bottle, a box, a detergent bottle, a cup, and a decorative rock. The transparent part of the Snapple bottle was painted blue to facilitate the object recognition process using a vision system

Fig. 11

Pose error model for table-top grasping, parameterized by \(<x, y, \theta >\). Assuming a table-top grasping scenario, this model considers translational error (\(x,\,y\)) within the \(x-y\) supporting plane and rotational error (\(\theta \)) around the normal direction of the supporting plane

5.2.2 Building a tactile experience database

The tactile experience database stores stable grasps as tactile experience. To build our tactile experience database, we defined a stable grasp for each object model in Fig. 9 using the GraspIt! simulator. For each of the stable grasps stored in the tactile experience database, we also precomputed the tactile feedback at grasping poses perturbed from each of the stable grasps due to pose error. To do this, we first put the hand at the ideal grasping pose. Then, we uniformly sampled the space of pose uncertainty \(S = \{<x, y, \theta >|x\in [-30,30], y\in [-30,30], \theta \in [-20,20]\}\) and used each of the sampling pose error \(<x,y,\theta >\) to perturb the object and generate the tactile feedback at each of the perturbed grasping pose. In our work, the sampling is 5 mm in dimension-\(x\) and dimension-\(y\) and 5 degrees in dimension-\(\theta \). For the spread angle, we sampled 5 degrees above and below the ideal spread angle for the grasp. Thus, this precomputation generated 4572 sampling perturbed grasping poses for each stable grasp. This precomputation took place off-line and the database was stored for later use. Figure 12 gives us two examples of stable grasps on two different objects and four exemplar local tactile experience records generated from the corresponding pose error. These stable grasps along with the tactile feedback from the perturbed grasping poses were stored to form our tactile experience database.
Fig. 12

Examples of stable grasps and precomputed local tactile experience. Each stable grasp in the tactile experience database is stored with a complete set of parameters that can be used to reconstruct the grasp: including the joint values and hand pose with respect to the object. The local tactile experience for nearby perturbed grasps is precomputed based on a list of precomputed pose error for the object, which is described in Sect. 5.2.2

In terms of the parameter in the distance function, Eq. 6, we empirically chose the value \(\alpha = 100\) so that 0.01 radian difference in joint angles is equivalent to 1 mm in Euclidean distance. We also experimentally chose two thresholds for the decision diamonds in the hand adjustment procedure of Fig. 2. If the distance metric of an actual grasp to one of the nearest neighbors in the database is less than \(t_1 = 10.0\), we decide this grasp is close enough to experience. If the distance metric of an actual grasp to any one of the nearest neighbors is greater than \(t_2 = 30.0\), we decide the actual grasp is too far from experience and no similar experience is found.

5.2.3 Grasping with hand adjustment in simulation

In the simulation test, our goal is to see how our hand adjustment procedure could help improve the grasping performance starting at a grasp that is perturbed from the ideal grasping pose due to pose uncertainty. Our experiments were conducted on the object models shown in Fig. 9. To simulate pose error, we randomly sampled the space of pose uncertainty \(S = \{<x, y, \theta >|x\in [-30,30], y\in [-30,30], \theta \in [-20,20]\}\) and generated 110 pose errors. We injected each pose error to each stable grasp in the tactile experience database and created 110 perturbed grasps on each object as the initial grasping poses for test. The evaluation procedure then started by closing the fingers at an initial grasping pose, followed by five consecutive hand adjustments to validate how hand adjustments could influence the grasp stability. We also recorded and analyzed the closest distance to a stable grasp in the database as an indicator to show how hand adjustments could help reduce the tactile contact difference to stable grasps in the tactile experience database.

Figure 13 shows the mean distance to the closest stable grasp in the tactile experience database. As hand adjustments were applied, the grasps were optimized in terms of their distances to the stable grasps in the tactile experience database. Figure 14 shows the percentage of stable grasps as five consecutive hand adjustments were made. The main trend is that as the hand adjustments are applied, the percentage of stable grasps increases, which indicates that the hand adjustments based on tactile sensing data does improve the robustness of the grasping procedure under pose uncertainty.
Fig. 13

Distance to nearest stable grasps before each hand adjustment is applied. The horizontal axis is the number of hand adjustment. The vertical axis is the distance to the nearest stable grasps. As hand adjustment is applied, the distance of the actual grasp to the nearest stable grasp in the tactile experience database decreases

Fig. 14

Percentage of stable grasps before each hand adjustment is applied, averaged over 110 grasping trials. The horizontal axis is the number of hand adjustment. The vertical axis is the percentage (from 0 to 1) of stable grasps whose epsilon quality \(\epsilon > 0.1\). The first bar in each graph corresponds to the percentage of stable grasps after the initial grasp is executed. These graphs show that as hand adjustment is applied, the main trend for the percentage of stable grasps is increasing, which indicates that by applying hand adjustment the grasping performance improves

5.2.4 Grasping with hand adjustment on a real robot

In the physical experiments, the Barrett hand was attached to a six-DOF Staubli robotic arm. Objects used in our test are shown in Fig. 10. The tactile experience database we used in the physical experiments was the same as in the simulation experiments. A kinect sensor first acquired a 3D point cloud of the scene. The recognition method proposed by Papazov and Burschka (2010) was used in our perception system, which uses partial geometry of an object to recover its full 6D pose. Once the pose of an object was estimated, a predefined stable grasp was retrieved and was perturbed by a pose error that was generated in the same way as in the simulation test. Finally, the OpenRave planner (Diankov and Kuffner 2008) generated a collision-free trajectory to the grasping pose and the hand moved to the target pose and executed the grasp. After this initial grasp was established, the hand adjustment procedure proceeded to improve this grasp. Strictly speaking, the perception system also introduced error into the system, but since the Kinect was relatively close to the object, the pose error from the perception system was not as significant compared to the injected pose error. In this sense, we considered the injected pose error as the major error in the system. For each of the objects, we ran 10 grasping trials, each with a different pose error.

When the robot had exited the pipeline of Fig. 2 via state stable grasp achieved of Fig. 2, it would lift up the object. After the lift up action, a “shake test” took place by rotating the last joint of the robotic arm within a range of \(\pm 60\) degrees. The scoring criteria for a grasping test were as follows:
  • If the object falls on the table after lift up or the shake test, score 0;

  • If the object moves in hand during the motion of finger close, lift up, or the shake test but stays in hand in the end, score 0.5;

  • If the object stays stable in hand throughout the entire grasping process, score 1.

The intuition behind this set of criteria was that the object should remain stable during finger close, lift up, and a shake test to maximally preserve the static status of the object.
When similar tactile experience is found in the tactile experience database, hand adjustments are computed based on tactile experience. Figure 15 gives an example of hand adjustment on a detergent bottle with similar tactile experience. Initially, the palm did not touch the detergent bottle and only three fingertips touched the surface of the bottle with small contact areas, making the grasp fragile. After four consecutive hand adjustments, the grasp was optimized so that the palm touched the detergent bottle and formed a strong power grasp.
Fig. 15

An example of hand adjustment with tactile experience. a The initial unstable grasp where the palm did not touch the detergent bottle. be The grasp status after each hand adjustment was applied. f The robot hand lift up the object from the table

When there is no similar tactile experience found in the tactile experience database, the surface for each contact would be reconstructed for grasp planning. In our work, the robotic hand moved along the \(x\) and \(z\) directions of its palm as illustrated in Fig. 16 and collected contact points when the fingers were closed at each waypoint. The range of the motion is 50 mm along these two directions with an interval of 5  mm. This process is a relatively time-consuming process compared to calculating a hand adjustment directly from tactile experience. In our experiments, only 4 out of the 50 trials ended up with local geometry reconstruction. Figure 17 shows an example of a Barrett hand executing a grasp after it reconstructed the local geometry of a Snapple bottle and planned two stable grasps using the reconstructed local geometry.
Fig. 16

Exploratory directions of a Barrett hand for local geometry reconstruction. In the exploration process, the robotic hand moves along the \(x\) and \(z\) directions and collects contact points on the surface of the object at each waypoint by closing its fingers. The range of the motion is 50 mm along \(x\) and \(z\) directions with an interval of 5 mm

Fig. 17

Grasping using grasps planned on reconstructed local geometry. a A snapshot of the executed grasp planned using the GraspIt! simulator on the local geometry shown in (b). In b, the reconstructed local geometry is shown in gray with original data points extracted from tactile sensing data shown in black. c, d Two example grasps planned based on the local geometry. Grasp in (c) was executed since it had a larger epsilon quality. The transparent bottle model shown in (c) and (d) is used here solely for visualization

Table 4 shows the details of the tests of grasping using our grasping pipeline. As a comparison, we also ran 10 grasping trials for each object starting at the same initial grasping pose using a conventional grasping pipeline which does not have the post-execution grasp stability estimation and hand adjustment procedure. Figure 18 shows experimental results on the grasping performance using a grasping pipeline with and without our post-execution grasp stability estimation and hand adjustment procedure. Grasping scores give us a more detailed analysis of the grasping performance, which captures more detailed object stability information during each dynamic grasping process because a grasp test was considered a success as long as the object was in hand after the lift-up, but during the grasping process the object could move in hand or in the any stage of finger closing, lift up, and shake test. For each object, the overall grasping performance is improved using our method.
Table 4

Details of grasping with a tactile experience database

Object

# Of grasps

Avg. # adj.

Lift-\( {up}^{a}\)

Score

Snapple

10

5.5

10

1.0

Box

10

3.3

10

0.95

Detergent

10

2.1

8

0.75

Cup

10

2.4

10

0.95

Rock

10

3

10

0.95

\(^{a}\)The robot hand successfully lifts up the object. The object could have moved during the grasping and lift-up procedure

Fig. 18

Grasping scores w/ and w/o the post-execution grasp adjustment procedure. Bars in blue show the scores of grasping using our method and bars in red show the scores of grasping without the grasp adjustment procedure. The scoring criteria are discussed in sect. 5.2.4 (Color figure online)

6 Discussion

In our work, we assumed a table-top grasping scenario where an object is placed on a supporting plane. For this grasping scenario, we used a pose error model as illustrated in Fig. 11 to parameterize the uncertainty space due to the pose error, which is a 3-dimensional space. We created our tactile experience database according to this model as well. However, in a more general robot system, grasping situations can be different from what we have assumed and the uncertainty space can be as complicated as 6-dimensional: three dimensions for translational uncertainty and three dimensions for rotational uncertainty. In this case, our method could also apply but the creation of the tactile experience database will be based on sampling the 6-dimensional space and the search for hand adjustments will have to be done within this 6-dimensional space. However, this space will be exponentially larger than the current 3-dimensional space. More advanced search algorithms should be utilized to improve the efficiency of the algorithm.

We injected artificial errors into the execution of initial grasping poses. The reason is two-fold: (1) the kinematics of our robot arm is precise and consistent and (2) the pose error from the vision system is relatively small. In this situation, we would like to add more errors that are significant enough to make a difference. In the future, we will perform some experiments using some other less accurate robot arms and see how our grasping pipeline performs.

In the experiments of Sect. 5.2, we showed the process to build our tactile experience database using commonly grasped objects. In order to scale up our system, more objects and grasps should be added using the same method as described in Sect. 5.2.2.

One potential and practical challenge for our current hand adjustment procedure is that potential disturbance introduced during letting go the object and re-grasping the object is possible in the current setup with the current hardware. Thus the object may be moved during finger closing or releasing. However, we believe this problem can be alleviated through two different methods. One is integrating more sensitive sensors to detect gentle touch before the object is moved, e.g., the strain gauges in the Barrett hand. Another method is to utilize proximity sensors which can predict contact locations before the object is touched. We will look into these possibilities in our future work.

Another challenge is to deal with objects which are small relative to an anthropomorphic hand, e.g., wrenches and pens. These objects are usually difficult to grasp from their natural poses on a support surface, e.g., a tabletop. We think more advanced control methods should be integrated to establish the initial grasp. One potential approach is to utilize the work done by Kazemi et al. (2012).

In addition, we will add more grasp quality measurements into our work and compare their influence on the performance of our pipeline. Currently we have used the epsilon quality and the volume quality of a grasp as they are widely used in the robotics community. There are other grasp quality measurements available and some of them may benefit our grasping pipeline in general. So, by adding more quality measurements into our pipeline, we may be able to have a more comprehensive quality evaluation criterion.

At the core of our stability estimation and hand adjustment algorithms, the local geometry at contact is the focus and there is no assumption made about the global shape of an object. Thus, our methods are global-shape-independent. This makes it possible to extend our methods to grasping novel objects. One approach is to utilize the idea of “grasping by parts” where we first extract grasping parts from the objects in the tactile experience database using methods such as proposed by Detry et al. (2012) and look for similar grasping parts on novel objects. With similar grasping parts found on the novel objects, we can align them and synthesize initial grasp hypothesis. Once the initial hypothetical grasp is executed, our post-execution grasp adjustment can take place to optimize the grasp. We will address this problem in our future work.

7 Conclusion

In this paper, we presented a grasping pipeline which is more robust under pose uncertainty compared to a conventional planning-based grasping pipeline. We developed a closed-loop post-execution grasp adjustment procedure to estimate the stability of the executed grasp and make necessary hand adjustments accordingly. To estimate grasp stability, we used a bag-of-words model to extract grasp features from the tactile sensing data and trained an SVM classifier to distinguish stable and unstable grasps. To synthesize a hand adjustment, we built a tactile experience database which consists of a set of stable grasps and their corresponding tactile sensor feedback. This database provides us with stable grasps using which we can better locate the executed grasps around a stable grasp and synthesize a necessary hand adjustment. Experiments were conducted in both simulation and physical settings. The experimental results indicate that our grasping pipeline with a post-execution grasp adjustment procedure increases the grasping performance under pose uncertainty compared to a conventional grasping pipeline.

Footnotes
1

Usually, a robot hand contains several DOFs, but we only want to control a subset of these DOFs during a hand adjustment procedure. For example, for the Barrett hand, we would only like to control its spread angle during a hand adjustment procedure. The DOFs of finger flexion will be controlled during hand closing.

 

Acknowledgments

This work is funded by NSF Grant IIS-0904514.

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  1. 1.Computer Science DepartmentColumbia UniversityNew YorkUSA