Adaptive assembly grippers
The first gripper, the pincher gripper (Fig. 4b), is a simple parallel plate mechanism with reinforced 3d printed fingers. The fingers are shaped and elongated to pick in a variety of parts including motors, fixing plates, and smaller parts such as bolts. This gripper uses an Mbed micro-controller to communicate with the PC and to control the single motor and read the motor current and switch. The motor current defines the end-stops and the force applied to any object in the gripper. The switch defines the home position, and this central position saves time when opening and closing.
The second gripper, the rotary gripper (Fig. 4a), is a more specialised design. This gripper design uses two motors, one to open and close the fingers and one to rotate the fingers. The fingers can rotate continuously and independently of the position, enabling the screwing in motion for assembly. The fingers contain springs which force the grippers open; this is exploited to enable another method of picking by gripping from the inside of a part. When the position bearing is extended, the fingers are squeezed together. This manipulator excels at picking up cylinders and triangular or hexagonal prisms. The limits of the open and closed position of the gripper are detected by monitoring the current in the motor. The rotations of the fingers are tracked by a single micro-switch; this also allows position calibration of the rotating head. The gripper enables the robot to adapt, by picking tools of different heads and sizes, and perform agile assembly with new parts.
The key task which this gripper enables is the use of tools, specifically the use of Allen keys to allow screwing of bolts. We have developed an innovative approach to enable bolts to stay at the end of Allen keys. This is the use of grease. The Allen key is dipped in grease and then inserted into bolt heads. When lifted, the screw remains attached to the bolt. This combination of the rotary gripper and tool usage makes this gripper very powerful and highly adaptive to many different tool types.
Adaptive kitting grippers
One fixed size gripper has been designed which allows kitting of all of the small parts (washers, nuts, bolts, etc.). This gripper has a soft adhesive pad (made from Blu-tack) which allows parts to be picked using adhesion. Adhesion has been previously shown to be an effective method for pick and place  and also for achieving climbing or holding onto walls .
To remove the part from the pad, a servo-controlled sleeve can push the part of the adhesive pad. The size of the adhesive pad has been designed to have sufficient adhesive and tack force to lift a single piece of all the small parts whilst also only allow picking of one piece to minimise the precision required from the vision system. Figure 4c shows the gripper developed. This method achieves adaptive gripping as the same gripper can be used to grip many parts of different form factors with no physical changes required. It also requires minimal accuracy and precision from the vision to minimise the development of custom systems.
Collaborative arm control
Collaborative two-arm control is required to achieve some of the complex assembly tasks. This provides the ability to:
Pass parts between grippers so they can be held in an optimum position or by an alternative gripper.
Hold certain parts stable whilst another gripper screws or otherwise interfaces with other parts.
Perform sub-assemblies, for example, put washers on a screw held in another hand before this sub-assembly can then be integrated with the rest of the system.
To achieve this, the two arms were calibrated together by determining the co-ordinates for the two robots at three separate points. One point is the base point, the other two points form vectors which are used to extrapolate the positions of the robots relative to each other. This allows the two hands to move together or move relative to each other. Figure 5 shows collaborative two-arm and gripper control which enables complex movements including the assembly and bolting of a motor to a motor bracket.
Force feedback algorithms for insertion and hole finding have been developed to remove the need for high precision and hard coding. The reduction of the need for hard coding makes the system more adaptive when the parts which must be assembled are changed. The force measurements from the UR5 are imprecise and fluctuate especially when accelerating. This is overcome by monitoring the difference in forces and thresholding. The hole finding function, Fig. 6, attempts to find the hole by moving towards it until a force limit has been exceeded or the final position has been reached. If the final position has not been reached, or no drop in force is detected, this indicates the hole has not been found. Thus, the end-effector performs a hunt, moving in circles of increasing radii until the force drops significantly, indicating the hole has been located. The drop in force indicates that the hole has been found; the robot then attempts a final force move to fully insert into the hole.
This insert function has many different input parameters which can be tuned for different holes and environments, including the force limit, circle radii, and speed of rotation and whether the hunting continues if the final force move in Fig. 6 does not reach the final position.
This insert function is used widely within the assembly. It is used to insert Allen keys into bolt head, the pulley onto the motor shaft, bolts into bolt holes, the shaft into the shaft housing and much more. The universal nature of the function allows it to be used for different parts, different locations, and to serve different overall functions.
Vision is used to detect and localise the different parts. It was important to develop a vision and learning system which can be rapidly expanded to include new or altered parts.
Data set and image pre-processing
A core part of the object recognition process is image pre-processing. The data set used for the experiments comprises 1500 RGB images of dimension \(1920\times 1080\times 3\). Each image was taken by a camera directly above the workspace, facing perpendicularly downwards on the object mat (see Fig. 7a). For each image in the data set, the mat was shifted and rotated manually at random; moreover, the objects are relocated in different positions. The labelling of the data set is achieved by selecting a bounding box over each object on the original images and a method which decreases labelling times.
Before feeding the data to a network, it is necessary to make the objects in the data set comparable to each other. First, we convert the RGB images into grey scales, reducing the dimensionality of the input layer and thus training time, and forcing the network to focus on geometrical information, rather than colour discrimination. We choose an object-detection image size of \(300\times 300\), based on the dimension of the largest object in the figures, i.e. the belt, and automatically crop each object based on its labelled bounding box. Each object image is padded on each side, to reach the \(300\times 300\) standard dimension. As shown in Fig. 7b, the object will thus be at the centre of the image, surrounded by ‘0’ pixel values. Here, the dimension of the outer padding provides useful information for object discrimination. Moreover, ‘0’ pixels do not excite any weight units in the network and will thus be naturally discarded for discrimination. After the pre-processing procedure, the data set corresponds to 15,200 images, each containing a single object (Fig. 8).
Inception convolutional neural network for object recognition
To perform object recognition and to cope with objects of different sizes, we devised a shallow inception convolutional neural network. The ICNN devised is comprised of an input layer, an inception layer, two convolutional layers with 3 \(\times \) 3 kernels with 32 channels (shallow convolutions), and two fully connected layers before an output layer classifying each object into its class with a softmax function. The inception layer is formed of 4 parallel convolution layers with 20 channels and increasingly larger kernels of 2 \(\times \) 2, 3 \(\times \) 3, 5 \(\times \) 5 and 7 \(\times \) 7. The relative size of kernels allows the network to learn local features at different scales, thus coping with the varying size of the objects in the data set. All units in the network perform a ReLu non-linear transformation.