Keywords

1 Introduction

In the Bin-Picking process entangled workpieces are a common source of problems for incorrect handling. To increase the robustness of successful grasps in Bin-Picking, the application can be extended with an entanglement detection and furthermore with separation methods [1,2,3,4,5]. It has been shown, that such entanglement detection can be realized with the use of neural networks in a model-based approach [6]. However, when applied to new workpiece geometries, this supervised learning approach requires an expensive deal of effort.

The current state of the supervised learning approach of the entanglement detection [6] uses a deep convolutional neural network. The architecture is inspired by DenseNet [7] and is trained with grayscale depth maps of potentially entangled situations using the supervised learning approach.

The depth maps are generated in a simulation and later transferred to reality using Sim-to-Real-methods. To conduct the Sim-to-Real-Transfer we use CycleGAN as a domain adaptation method and several domain randomization parameters, for example Gaussian noise on the input images. As the approach is model-based, the simulation needs the geometric information of the workpiece. Each workpiece therefore requires its own specific entanglement detection data generation and training. In order to receive a high performance entanglement detector, the training of the neural network requires up to 20,000 depth maps as training inputs. The training process and the data generation amount to 46 h on a standard hardware. In summary, the current state holds potential to reduce the effort on adapting the entanglement detection to new workpieces.

Meta-Learning shows great success in accelerating the adaption of neural networks and creating strong classification models with only few data samples. For this reason, different Meta-Learning methods were investigated for their suitability to reduce the effort on training new workpiece geometries for the entanglement detection.

In summary, the main contributions of this paper are:

  • a base dataset for Meta-Learning based entanglement detection

  • the comparison of the Meta-Learning methods applied to the entanglement detection

  • the validation of the practical feasibility to reduce the effort on adapting the entanglement detection on new workpiece geometries with Meta-Learning

2 Meta-Learning

Meta-Learning enables machine learning models to use experience gained from related tasks [8]. It transfers previously learned knowledge of the training process and enables a neural network to perform this task faster and better. Meta-Learning is a learning process on two levels. The general procedure depends on the current Meta-Learning method and will be explained in the course of this work. Meta-Learning is used to realise powerful classification models with only a small amount of training data. With this procedure, several Meta-Learning methods achieve a high performance in few-shot image classification [9,10,11,12,13,14] or object detection [15,16,17].

As Meta-Learning grows in interest, a variety of Meta-Learning methods exist, which can be divided into gradient-based and metric-based algorithms, among others [18]. For the entanglement detection we perform experiments with the gradient-based algorithms Reptile [10] and the more complex MAML [9]. MAML achieves great success in generating a task-agnostic network which can adapt to new tasks in few gradient steps. Therefore MAML uses the second-order derivatives as meta-gradient. The Reptile algorithm simplifies the method and is able to successfully meta-learn with fewer classes that are sufficiently populated [19]. As metric-based algorithm we test TAMS [20], which is based on prototypical networks [11] and dedicated for medium-shot applications. Since the entanglement detection tends to follow the character of medium-shot classification with less available classes but sufficient shots, we chose Reptile and TAMS for the experiments in addition to MAML.

3 Meta-Learning Applied to the Entanglement Detection

This section presents a brief overview over the base dataset used for the Meta-Learning based entanglement detection. Furthermore, it introduces the different investigated Meta-Learning methods applied to the entanglement detection.

3.1 Base Dataset

Successful Meta-Learning requires a base dataset of source tasks closely related to the later target task. In this case, the target task is the classification between entangled and non-entangled workpieces of an unknown geometry. Therefore, the base dataset consists of 54 different workpieces with various geometries in total. Each workpiece provides the entanglement detections as classification task with 200 synthetic depth maps, half of them showing entangled workpieces.

To validate the Meta-Learning implementation, the Omniglot [21] dataset was used. This dataset consists of images from letters and is similar to the depth maps in the manner that both have only one channel. Even though the two image datasets differ in their structure, one could observe a benefit on Meta-Learning by pretraining the models with Omniglot. The Omniglot dataset was consulted to pretrain and to verify the functionality of the Meta-Learning method.

3.2 Implementation to the Entanglement Detection

The Meta-Learning based entanglement detection was applied as K-shot N-way classification, where K describes the quantity of the training data and N the number of classes distinguished.

Fig. 1.
figure 1

Meta-Learning with depth maps of a) connecting rods, b) u-bolts, and c) double hooks, for faster adaptation on the entanglement detection of d) metal holder.

The implementation of the Meta-Learning based entanglement detection is realised in such a way that for each meta-batch N/2 workpiece geometries are sampled randomly from the base dataset. Both gradient-based algorithms are based on a 5-shot 6-way classification due to experimental experience. Therefore one meta-batch represents the simultaneous entanglement detection of three different workpiece geometries. This scheme is sketched in Fig. 1. In case of TAMS a 5-shot 20-way classification is selected as best hyperparameter for the entanglement detection application. After each Meta-Learning epoch the K-shot N-way classification is repeated and evaluated with unknown geometries for testing the Meta-Learning model without updating it.

The Meta-Learning results in a strongly generalised network for the entanglement detection of multiple workpiece geometries. In order to finetune it to the unknown workpiece geometrie with transfer learning later, the classification layer of the generalised network is modified to a binary classifier.

3.3 Training of the Meta-Learning Methods

The training of the Meta-Learning applied to the entanglement detection is monitored using a subset of previously separated workpiece geometries from the base dataset. We use a split of 46 workpiece geometries for training and eight workpiece geometries for validating the adaptability. Figure 2 exemplarily shows the training plots from the Reptile and MAML Meta-Learning. In a training it happens that the validation accuracy is better than the training accuracy. We explain this behavior by the quality of the training and validation data. The training data was generated some time ago with an outdated physics simulation, while the validation data is from a revised version. In detail, data acquisition with a virtual depth image sensor and the physical interaction of the components in the bin have been improved through optimizations in simulation.

Fig. 2.
figure 2

Meta-Learning applied to the entanglement detection. a) Reptile, b) MAML

The Reptile Meta-Learning converges within 5,000 Meta-Learning epochs which takes about 4.5 h. The MAML Meta-Learning needs about 24 h for 8,000 epochs and then starts overfitting. This is due to the few source tasks in the base dataset. With a larger base dataset with hundreds of different workpiece geometries the performance of MAML is expected to improve. The TAMS-algorithm also suffers from the few classes and does not make any significant progress in the Meta-Learning. The structure of the base dataset with sufficient labeled data points per class but only few training tasks fits the Reptile algorithm best [19].

4 Results

4.1 Comparison of Applied Meta-Learning Methods

To compare the Meta-models generated by Reptile, MAML and TAMS, we use three new workpiece geometries not utilised in the Meta-Learning process yet. We also add a model with randomly initialized network parameters to the comparison which has to train the entanglement detection from scratch. We test the adaptation of the four models in dependency of the number of training data. While varying the amount of training data, the 2,500 depth maps for testing remain the same for each workpiece. We repeat each adaptation training eight times with different dropout-rates for regularisation and capture the best performance for each training data amount afterwards. Figure 3 shows the results for the three chosen workpiece geometries.

Fig. 3.
figure 3

Best models after 100 epochs of adaptation training to new workpiece geometries for a) connecting rods, b) double hooks, c) u-bolts

Comparing the workpiece geometries with each other, it is noticeable that the entanglement detection of the connecting rods is easier to learn than of the u-bolts, which results in higher test accuracies. The four models therefore do not differ much in the performance after the adaptation to the connecting rods. In case of the double hook and the u-bolt it can be seen that the Reptile model outperforms the other Meta-models and the random model in nearly every training data amount by far. The biggest benefit of the Meta-Learning can be observed in the adaptation to the double hooks with 2,500 depth maps.

4.2 Performance Validation of the Meta-trained Entanglement Detection on Unseen Workpiece Geometries

The model comparison leads to the choice of the Reptile algorithm as Meta-Learning to reduce the effort on training new workpiece geometries for the entanglement detection. This method is once more validated with the metal holder, shown in Fig. 1d, as workpiece which is interesting for industrial applications at a customer site. In doing so, the direct gain of Reptile as method to adapt to new workpieces with less effort, in later contexts abbreviated scaling method, is recorded.

Therefore the Reptile model and a model without prior knowledge from a Meta-Learning are compared in the adaptation with the same depth maps in a training with identical parameters. Figure 4 shows the progression of test accuracy and test loss during training with a training dataset consisting of 2,500 (green) data samples for the Reptile model and 2,500 (blue) and 5,000 (purple) data samples for the random model. The test dataset amounts to 5,000 equal depth maps.

Fig. 4.
figure 4

Adaption to the new workpiece geometry with and without Meta-Learning

One can observe that the Reptile model immediately starts adapting to the new workpiece and reaches a high classification performance in significantly less training epochs than the model without the Meta-Learning. If the training data is doubled to 5,000 instances, it is possible to reach a similar performance to the Meta-Learning, but with significantly larger number of epochs. In all cases, the loss of the adaptation training indicates the start of overfitting the model after it converged. The test-accuracy however remains stable and Reptile outperforms the current state of the entanglement detection by 100 epochs and 2,500 training samples.

5 Summary and Outlook

In this paper we introduced a scaling method to reduce the effort of training new workpiece geometries for entanglement detection in Bin-Picking. We compared three different Meta-Learning methods from the current state of the art in their usability for the entanglement detection. The chosen scaling method based on the Reptile algorithm helps reducing the amount of training epochs and therefore the training time by about 80 percent points and halves the amount of training data, which also halves the simulation time.

The scaling method makes the entanglement detection feasible for faster responses to new workpiece geometries in the Bin-Picking application. However, to achieve an entanglement detection with a higher performance than with Meta-Learning, more training data can be used to train a specific machine learning model. In conclusion, the Meta-Learning method helps reacting quickly on customer requests and a more accurate entanglement detection model can be updated later.

To further improve the scaling method in future work, it is of interest how the Meta-Learning becomes stronger with the growing base dataset through new workpiece geometries.