Continual learning approaches to hand–eye calibration in robots

Bahadir, Ozan; Siebert, Jan Paul; Aragon-Camarasa, Gerardo

doi:10.1007/s00138-024-01572-w

Continual learning approaches to hand–eye calibration in robots

Research
Open access
Published: 10 July 2024

Volume 35, article number 97, (2024)
Cite this article

Download PDF

You have full access to this open access article

Machine Vision and Applications Aims and scope Submit manuscript

Continual learning approaches to hand–eye calibration in robots

Download PDF

Ozan Bahadir^1,2,
Jan Paul Siebert¹ &
Gerardo Aragon-Camarasa¹

393 Accesses
Explore all metrics

Abstract

This study addresses the problem of hand–eye calibration in robotic systems by developing Continual Learning (CL)-based approaches. Traditionally, robots require explicit models to transfer knowledge from camera observations to their hands or base. However, this poses limitations, as the hand–eye calibration parameters are typically valid only for the current camera configuration. We, therefore, propose a flexible and autonomous hand–eye calibration system that can adapt to changes in camera pose over time. Three CL-based approaches are introduced: the naive CL approach, the reservoir rehearsal approach, and the hybrid approach combining reservoir sampling with new data evaluation. The naive CL approach suffers from catastrophic forgetting, while the reservoir rehearsal approach mitigates this issue by sampling uniformly from past data. The hybrid approach further enhances performance by incorporating reservoir sampling and assessing new data for novelty. Experiments conducted in simulated and real-world environments demonstrate that the CL-based approaches, except for the naive approach, achieve competitive performance compared to traditional batch learning-based methods. This suggests that treating hand–eye calibration as a time sequence problem enables the extension of the learned space without complete retraining. The adaptability of the CL-based approaches facilitates accommodating changes in camera pose, leading to an improved hand–eye calibration system.

Continual Learning for Multi-camera Relocalisation

Solving the robot-world hand-eye(s) calibration problem with iterative methods

Article 02 May 2017

On Hand-Eye Calibration via On-Manifold Gauss-Newton Optimization

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Humans have the ability to manipulate objects in their daily lives with ease. However, robots need an explicit model to transfer knowledge from camera observation to its hands or base. This issue is commonly known as hand–eye calibration. In the literature, the hand–eye calibration problem has been extensively investigated from the mathematical optimisation perspective for one-camera configuration [1,2,3,4]. This means the estimated hand–eye calibration parameters are valid only for the current camera and robot. However, the camera pose may change with respect to the robot base because of contact with the environment or layout changes in the workspace. Hence, a flexible and autonomous hand–eye calibration system is crucial to design a more adaptable industrial manufacturing system.

Recent advancements in deep learning architectures, such as convolutional neural networks, have led to the development of hand–eye calibration systems [5,6,7] capable of estimating different camera poses with respect to a single robot base, thus providing greater flexibility and adaptability to industrial manufacturing systems. However, these deep learning-based approaches have limitations. Deep learning-based approaches are typically trained offline on fixed datasets, resulting in static models that cannot adapt to new data or environmental changes. These models require retraining when the environment or the relative positions of the camera and robot fall outside the learned space. This retraining process necessitates extensive data storage and management, which can become cumbersome. Additionally, it necessitates stopping robotic manipulation tasks during the training process, leading to costly downtime. These approaches are also highly dependent on the sensor used during training. If the sensor is changed post-training, the calibration error tends to increase due to the models’ static nature. Similarly, Recurrent Neural Networks (RNNs) [8] share this static characteristic, lacking the capability for incremental learning. Upon the arrival of new data, the entire network must be retrained from scratch. Additionally, RNNs suffer from catastrophic forgetting, where the network loses previously acquired knowledge when extensively processing new data. As the sequence length increases, the computational cost escalates, and the vanishing gradient problem [8] becomes more pronounced.

In contrast, humans can continuously update and refine their internal hand-to-eye model based on new experiences. In robotic manipulation, continuous learning can be achieved through Continual Learning (CL) [9], an active and ongoing learning paradigm in deep learning where a model is trained on a continuous data stream over time. By utilising CL, robots can learn and adapt similarly to human learning, enabling them to perform more complex tasks and operate in a broader range of environments.

One of the main challenges in CL is the catastrophic forgetting problem [9], which means that when new data is processed, the previously learned information is degraded over time. To overcome this problem, three main strategies have been proposed: regularisation [10,11,12], modular architectural design [13, 14] and rehearsal model (buffer) [15,16,17]. The regularisation methods aim to prevent catastrophic forgetting by adding terms in the objective function to control changes in model weights. Modular architecture design methods mitigate the catastrophic forgetting problem by dedicating sub-modules for different tasks or expanding the network architecture when a new task is defined. Rehearsal approaches replay stored old data (some parts) to the model again periodically to prevent the catastrophic forgetting problem.

While humans can manipulate objects effortlessly, robots require an explicit model to transfer knowledge from the camera to their hands or base. hand–eye calibration, a well-known concept in the literature, has been extensively studied via mathematical optimization for one camera configuration. However, the estimated hand–eye calibration parameters are only applicable to the current pair of the camera and robot. This limitation has significant implications for the accuracy and efficiency of robotic systems, making our research particularly relevant and impactful in fields such as manufacturing and healthcare.

In this paper, we introduce a novel Continual Learning hand–eye calibration approach, which extends the learned camera extrinsic parameters over the robot’s working volume. This approach tackles the issue of the camera pose changing over time, a factor that can impact the accuracy of robotic manipulation tasks. To overcome this, we propose three innovative approaches: naive CL, reservoir rehearsal, and reservoir rehearsal with camera pose selection. The naive CL approach, which updates the model with new data, can result in catastrophic forgetting of previously learned information. To counter this, we employ the reservoir buffer approach, which samples uniformly from old data at the previous time step and combines it with new data. Finally, we develop a hybrid approach that combines old data sampling (reservoir) and evaluates new data for the presence of new information.

Our experiments include simulated and real-world environments. We placed a stereo vision camera at 108 different locations in the simulation environment, and we ran 50 different end-effector configurations for each camera pose to collect samples. As for the real world (presented in Fig. 1), we placed a stereo vision camera at 24 different locations and ran 100 end-effector configurations for each camera pose. We divided our datasets into subsets to train our models in a Continual Learning manner and evaluated the success of our approach based on the accuracy of the hand–eye calibration in both environments. Finally, we compared the performance of our three Continual Learning approaches with batch training for both environments.

To address the question of how we demonstrate the proposed methods’ capability to accurately estimate the calibration matrix of hand–eye beyond the learned space, we leverage CL to extend the learned hand–eye calibration space effectively. The network is updated using current (new) data while incorporating sampled past observations. This dynamic updating process enables the network to adapt to new data and maintain performance accuracy even when confronted with significant deviations from the initially learned calibration parameters. The results of our study demonstrate that, except for the naive CL approach, our newly developed CL-based methodologies yield competitive outcomes with the batch training approach. These results show that our CL-based approaches’ flexibility and robustness, particularly in scenarios involving significant variations in camera poses that extend beyond the initially learned calibration space. Furthermore, our approach enables efficient adaptation to novel camera poses outside the learned calibration space without the need for extensive data storage or the retraining of the network on the complete dataset. This feature underscores our proposed method’s practicality and resource efficiency in scenarios where camera pose changes are common.

2 Related work

The literature review is divided into three sections, each providing an in-depth analysis of existing research. Section 2.1 focuses on hand–eye calibration, while the second part examines Deep Learning-based hand–eye calibration (baseline). Finally, Sect. 2.3 addresses Continual Learning.

2.1 Hand–eye calibration

The hand–eye calibration problem has been intensively investigated through mathematical optimisation approaches. There are three mainstream approaches to formulate this problem: $AX=XB$, $AX=YB$, and reprojection error.

In the $AX=XB$ approach, A and B represent the relationship of the coordinate poses in n different frames for the robot’s end-effector and the camera, respectively. Meanwhile, X represents the unknown static transformation between the camera and the robot’s end-effector. In this formulation, A and B are the observable variables via the robot kinematic chain and camera calibration, respectively. Besides, A, B and X are the rotation and translation components to represent the transformation in 3D cartesian space. The solution of this formulation can be divided into two categories: separation and simultaneous approaches. Separation-based approaches [1, 4] solve the rotational component of the unknown transformation (X) and find the translational component using the estimated rotational component. These approaches transfer errors from the rotation to translation components. To eliminate this error transfer, simultaneous approaches [18,19,20] handle these two components simultaneously with nonlinear objective functions. However, these approaches require good initialisation parameters for the nonlinear optimisation solver. Additionally, the solution quality for $AX=XB$ highly depends on the sampling of A and B.

$AX=YB$ eliminates the necessity of good sampling of A and B by defining one extra unknown variable, Y, which represents the static transformation between the end-effector of the robot and the calibration target. Observable variables, A and B, represent the pose of the end-effector and the camera. In this formulation, separation [3, 21, 22] and simultaneous [2, 21, 23, 24] approaches have been proposed, similar to the $AX=XB$ formulation.

The formulations, $AX=XB$ and $AX=YB$, fundamentally rely on the geometric relationships within the three-dimensional Cartesian space. The solutions derived from these formulations are particularly susceptible to errors stemming from camera calibration, denoted by the variable B in both formulations. To mitigate this issue, reprojection error-based approaches [25, 26] have emerged. These approaches utilise the disparity between the estimated and actual feature key points in the two-dimensional image plane. These feature key points are either predefined or extracted from the images. By quantifying this disparity, reprojection error serves as the guiding metric in determining optimal hand–eye calibration parameters, striving to enhance the robustness and accuracy of the calibration process.

The above approaches are known as classic hand–eye calibration in the literature, and they are not flexible against the camera pose change with respect to the robot base. That is, they are only valid for the current camera and robot configurations, and when this configuration changes, the hand–eye calibration process has to be performed from scratch to obtain valid hand–eye calibration parameters. To address this limitation, researchers have proposed deep learning-based hand–eye calibration methods. For example, Lambrecht [27] and Lee et al. [7] used deep learning as a keypoint detection method, followed by the Perspective-N-Point (PnP) algorithm [28] to obtain the hand–eye calibration parameters. Valassakis et al. [6] employed an end-to-end deep learning architecture that directly estimates the hand–eye calibration parameters, where the camera is attached to the end-effector of the robot. In our previous work [5], we proposed a deep-learning architecture that estimates the hand–eye calibration parameters directly from a single pair of RGB and depth images. Although [6] and [5] achieved flexibility for a wide range of hand–eye calibration spaces, they are trained offline and cannot adapt to changes in data distribution and the robotic environment on-the-fly.

2.2 Deep learning-based hand–eye calibration

Our previous work [5] proposed a supervised learning appraoch to estimate the camera’s pose from RGB and depth images and the pose of a single reference point on the robot manipulator. For this, we [5] employed two separate encoders (depicted in Fig. 2) to extract features from the RGB and depth images, which are then concatenated with the reference point’s pose. We must note that we do not explicitly segment this reference point on the end-effector. Due to the robot’s kinematic chain, which is used as input for the designed network, its pose relative to the robot base is known. Hence, the occlusion of this reference point does not impact our results. The resulting feature vectors are fed into a neural network that estimates either the translation or orientation component of the camera pose. By adopting this approach, we [5] avoided the need for a calibration target, as in traditional approaches such as Tsai’s hand–eye calibration [1], and implicitly handled the camera extrinsic parameters for each end-effector’s pose.

To train the network for the translation component (measured in millimetres) of the hand–eye calibration parameters, we used the Mean Squared Error (MSE) loss function. The Root Mean Squared Error (RMSE) between the ground truth and the predicted values was employed for performance evaluation. We adopted a 10-dimensional quaternion representation approach described in [29] for the orientation component. This method transforms the network’s output of 10 values into a 4D unit quaternion, ensuring compatibility with any network architecture and integration with the main network. Additionally, this approach addresses the double cover problem inherent in 4D unit quaternions by using one half-space of the unit quaternion. The network training for the orientation component was conducted using Eq. 1, which calculates the distance between two given unit quaternions. To interpret the orientation parameters, they were converted to degrees using Eq. 2.

$$\begin{aligned}{} & {} \begin{aligned} L_{chord}(R,R_{gt})=\Vert R_{gt}-R\Vert _{F}^{2} \end{aligned} \end{aligned}$$

(1)

$$\begin{aligned}{} & {} \begin{aligned} angle=\frac{720}{\pi } \sin ( 0.5 \cdot \min (\Vert R_{gt}-R\Vert _{F},\\\Vert R_{gt}+R\Vert _{F}))^{-1} \end{aligned} \end{aligned}$$

(2)

where R and $R_{gt}$ represent estimated and ground-truth unit quaternions, respectively, and F is the Frobenius norm.

To achieve hand–eye calibration in both simulated and real-world environments, a stereo-vision camera was placed in various configurations. The camera’s pose relative to the robot base was determined using Tsai’s hand–eye calibration approach [1], which was repeated at least five times to ensure accurate calibration. The selection of end-effector movements during data collection was found to be critical for a successful calibration. Once the camera was calibrated, the robot’s end-effector was run through multiple configurations to map out the robot’s workspace. During these tests, both RGB and depth images were recorded, along with the poses of the reference point for each end-effector and camera configuration.

Our previous results [5] indicate that our approach outperforms classical hand–eye calibration approaches by a factor of 96 times in terms of repeatability (mm), demonstrating its ability to produce the same results consistently under identical conditions. Furthermore, the hand–eye calibration error achieved by [5] is comparable to traditional methods while eliminating the need for data recollection. Overall, [5] enables the acquisition of hand–eye calibration parameters without data recollection after the initial training within the learned space. This renders it applicable to various robotic contexts, particularly in dynamic environments necessitating periodic recalibration within the learned space (constrained area).

2.3 Continual learning

Continual Learning (CL) enables deep learning models to continuously learn, update and adapt these models with new data streams or batches. Specifically, Continual Learning refers to the ability of a model to learn continuously from new data without forgetting previously learned concepts. This enables models to evolve and improve over time as new data becomes available. These approaches are characterised by two main features: stability and plasticity. Stability refers to the ability of the model to preserve previously learned concepts, and poor stability can result in catastrophic forgetting [9]. Plasticity represents the model’s ability to adapt to new data.

CL approaches can be categorised as task-incremental, domain-incremental, and class-incremental learning [30]. In the task-incremental scenario, independent tasks are learned with new data over time. An example of this scenario in a robotics context consists of learning to recognise sub-skills, such as reaching, grasping, and placing, to accomplish a pick-and-place task [31]. In the second scenario, domain-incremental approach maintains consistency in the learning task while accommodating evolving data distributions. An illustrative scenario is encountered in autonomous driving, where a pre-trained object detection model must adapt to diverse environmental conditions like varying weather patterns, seasonal changes, or different geographical locations [32]. In the class-incremental scenario, the model learns an increasing number of new classes using new data over time. An example of this scenario is learning new household objects for service robotics [33].

The inherent stability-plasticity predicament within Continual Learning (CL) paradigms, arising from the challenge of retaining past knowledge while accommodating new information, necessitates thoughtful solutions. In the context of CL, a common issue is the potential for catastrophic forgetting when a neural network processes only the data from the current time step. To mitigate this issue, CL can be categorised into three key solution categories: parameter isolation (architectural design),regularisation-based approaches, and replay-based (memory). Parameter isolation methods, often achieved through architectural design, manage the stability-plasticity problem by assigning dedicated parameters to each task. In dynamic models, the network evolves as new tasks are integrated into the training dataset. Weights of previous tasks are treated differently, with options including freezing [13] or copying to preserve past knowledge [14]. On the other hand, static models employ unchanging architectures with task-specific weight sets [34, 35]. These solution strategies strike a crucial balance between maintaining the stability of previously acquired knowledge and fostering the plasticity required for new information, offering a comprehensive response to the stability-plasticity challenge in CL. The choice among these strategies hinges on the specific demands and constraints of the given learning problem.

Regularisation-based approaches offer valuable mechanisms for preserving essential knowledge while accommodating new information. These strategies typically involve adapting loss functions to penalise substantial changes in critical weights. This is achieved through two primary strategies: parameter importance estimation and knowledge distillation. The parameter importance estimation approach focuses on identifying and safeguarding essential weights within the neural network by using Elastic Weight Consolidations [10, 11], Memory Aware Synapses [36], or Deep Model Consolidation [12]. On the other hand, the knowledge distillation strategy is concerned with preserving previously consolidated knowledge while learning new tasks. This is accomplished by employing knowledge distillation functions [37, 38].

Replay methods sample previously used raw data while updating the model with new stream data to overcome the stability problem. Several methods have been employed to select samples from previous data. A straightforward approach is random selection [15], which involves randomly choosing observations at regular intervals. While this method is easy to implement and provides a diverse representation of past data, it may overlook important features and focus on unnecessary data. Additionally, it does not ensure an even distribution of classes, leading to uneven data representation. Reservoir sampling [17], on the other hand, uses a normal distribution to stochastically sample data instances from a historical sample reservoir, thereby preserving a diverse and representative subset of the training set. However, it also does not guarantee an even distribution of classes. To address this issue, class balance approaches have been introduced [16], which ensure equal representation of each class by selecting samples within classes. Reply methods directly address the catastrophic forgetting problem by sampling previous data; however, parameter isolation approaches try to handle this problem by segregating parameters associated with different datasets. This segregation cannot be easily defined in complex neural networks because parameters in these networks are highly interconnected. As for regulation, parameters are highly task-dependent and require hyperparameter fine-tuning for each task. Hence, they have limited generalisability.

To the best of the author’s knowledge, no study has investigated hand–eye calibration with a Continual Learning approach. The closest related work can be found in the study conducted by Wang et al. [39]. Wang et al. [39] developed a CL-based visual localisation approach, which sequentially trained a network with a novel buffer system. They combined reservoir and class-balance buffer methods to overcome the catastrophic forgetting problem. The reservoir method enabled them to sample previous data with uniform distribution, while the class-balance method ensured that selected samples represented all scenes.

It is worth noting that their investigation is centred around the domain of visual localisation, particularly in the context of neural network training with a novel buffer system. The distinctiveness of our research lies in its application of CL to the intricate domain of hand–eye calibration, specifically to extend the learned calibration space. This critical divergence underscores the innovative and unexplored nature of our investigation, positioning it as a pioneering study within the field.

3 Methodology

In our prior research [5], we developed a deep learning-based method for hand–eye calibration that enables continuous estimation of calibration parameters within a specified region. This was achieved by placing the camera at multiple locations and performing various configurations of the robot end-effector for each camera pose. It is important to note that all camera and end-effector poses were assumed to be known at the outset of the training phase. Figure 3 illustrates the continuous representation of the 360-degree environment surrounding the robot base, centred within a semi-spherical 3D space. The blue-colored region denotes the learned 3D space, acquired through discrete camera poses (represented as black dots) within this area. Conversely, the red region falls outside the learned space, and the success of the hand–eye calibration model in [5] degrades in this region.

In the hand–eye calibration approach based on Continual Learning (CL), the acquisition of camera poses occurs progressively over time, facilitating the expansion of the learned space. In the hand–eye calibration approach based on Continual Learning (CL), the acquisition of camera poses occurs progressively over time, facilitating the expansion of the learned space. We initially collected the data and subsequently divided it into subsets to train the network, thereby mitigating long-tail effects. To address data distribution concerns, we strategically positioned m camera configurations and moved the robot arm through n uniformly selected configurations. We utilised sine and cosine functions in the simulation phase to generate camera poses relative to the robot base frame. This approach ensured an even data distribution throughout the training process.

The collected camera poses can be effectively managed by defining time steps. We propose two methodologies for transforming the hand–eye calibration problem into a continual learning (CL) framework, conceptualised as a time sequence problem. For the Naive and the reservoir buffer with class balance approaches, we define a region containing several camera poses as a single time step. While, for the reservoir buffer with class balance and camera pose selection, each camera pose is treated as a separate time step to facilitate the elimination operation. It is important to note that time steps correspond to the instances at which we update our network.

The first approach involves treating a collection of camera poses representing a specific area within the 3D Cartesian space as a single-time step. This approach enables the training and updating of the hand–eye calibration model based on the camera poses encompassed within that particular region. By systematically defining time steps in this manner, the entire space can be learned over time using Continual Learning. Figure 4 provides a visual representation of this concept, illustrating the semi-spherical 3D space around the robot base, as depicted in Fig. 3. In Fig. 4, the space is divided into six distinct regions, each denoted by a different colour. These regions enable the gradual acquisition of knowledge across the entire space by sampling camera poses within each region.

The second approach involves treating each individual camera pose as a separate time step. In this case, each camera pose represents a unique instance in time, and the hand–eye calibration model is updated based on the information provided by each pose individually.

The remainder of this section presents the adopted Continual Learning (CL) approaches to address the hand–eye calibration problem. First, we describe the straightforward CL approach, Naive CL. Following this, we introduce replay-based methods designed to mitigate the issue of catastrophic forgetting. As noted by [39], these techniques have demonstrated robust performance in regression tasks for domain-incremental scenarios.

3.1 Naive approach

The traditional approach of updating a trained model without any buffering is known as the naive approach. This approach uses a small batch size to update the network weights with new observations each time step. The naive approach trains the network with the dataset in the current time step and passes the learned weights to the new time step. Moreover, the naive CL approach evaluates the performance of the trained model for each time step by using all testing subsets. This evaluation shows the performance of the Naive CL approach in both the observed and non-observed spaces.

3.2 Replay-based CL approaches

3.2.1 The reservoir buffer with class balance

The reservoir buffer with class balance approach (Fig. 5) employs two distinct techniques to tackle the catastrophic forgetting problem. Specifically, this approach combines the use of reservoir and class balance techniques. The reservoir component employs a normal distribution to randomly sample from a buffer of previous data points, which helps maintain a diverse and representative sample of the training data. Meanwhile, class balance ensures that the training data is balanced across all classes, in this case, the camera poses. It prevents the model from focusing excessively on one class at the expense of others.

Algorithm 1 details the training of the network procedure with the reservoir buffer with the class balance approach. This algorithm requires training and testing datasets in the time domain. It also requires three parameters that are $c_1$, $c_2$ and $c_3$. $c_1$ is the number of sample sizes for previous camera poses, while $c_2$ is the number of camera poses used for sample collection. Finally, $c_3$ is the number of training samples for the current time. The dataset partition is detailed in sections 4.1 and 5.1, and m camera poses and n end-effector poses are used for training in each time step, which means m $\times $ n data (RGB and depth images and the pose of the reference point).

This approach considers a set of camera poses as a time step presented in Fig. 4, which represents a 3D region of the calibration space. In the first time step, the deep learning-based regression architecture was trained on the current dataset, and the learned weights were passed to the next time step. The learned weights were updated for the following time steps by processing the current training data and buffer data from previous observations. The performance of the approach was evaluated on the current and all test sets for each time step.

3.2.2 The reservoir buffer with class balance and camera pose selection

The reservoir buffer with class balance and camera pose selection is a hybrid method to streamline the processing of camera poses while maintaining the integrity of previously acquired data through the use of a reservoir buffer and the class balance technique. The reservoir buffer plays a crucial role in this process, enabling the retention of relevant data while eliminating redundant information. To determine whether a new camera pose contains novel information, a random sample is drawn and compared to a threshold value, which is determined experimentally and described in the experimental design Sects. 4.2 and 5.2. The network is updated only if the current hand–eye calibration parameters are not already within the learned space. The camera pose selection component reduces data storage and training time, making it more suitable for real-time robotics applications.

To determine whether a new camera pose contains novel information, a random sample is drawn and compared to a threshold value, which is determined experimentally and described in the experimental design Sects. 4.2 and 5.2. The reservoir buffer plays a crucial role in this process, enabling the retention of relevant data while eliminating redundant information. Additionally, the camera pose selection component reduces data storage and training time, making it more suitable for real-time robotics applications.

Algorithm 2 details the reservoir buffer with class balance and camera pose selection approach. This algorithm requires a threshold for judging whether the current camera pose contains novelty or not. When a new subset is introduced, a sample consisting of 10% of each camera pose is randomly selected using a normal distribution. Next, the mean errors for each camera pose are computed based on the last trained model. If the mean error for any camera pose exceeds a predefined threshold, that camera pose is considered to be novel and marked as such for further processing.

The training stage starts by processing the dataset consisting of a set of camera poses via DL-based regression architecture [5] in the first time step. Then this approach considers each camera pose as a new time step (as presented in Fig. 6). Hence, when the new camera pose arrives, algorithm 2 is used to determine whether the current camera pose contained novelty. If it contains novelty, the learned weights are updated by processing this camera pose and buffer from past observations. The success of the trained model in each time step is evaluated on all test sets to show the progress of the trained model.

4 Simulation experiments

4.1 Data generation and split

A Universal Robot 5 (presented in Fig. 7) with a parallel gripper was placed in the PyBullet simulation environment^{Footnote 1} The data split is depicted in Fig. 8, which shows a three-dimensional Cartesian space with each subset represented by a different colour. It should be noted that each subset is considered a time step for the naive CL and reservoir buffer with a class balance approach. On the other hand, for the reservoir buffer with a class balance and camera pose selection approach, the first subset (blue) is considered the first time step, and subsequently, each camera pose is treated as a separate time step. The overview of the adopted time step strategies is presented in Table 1.

The training and testing camera configurations for each subset are indicated by dots and stars in Fig. 8. This visualisation allows for a better understanding of the distribution of the camera configurations and the data split across the different subsets.

For each camera pose, the end-effector was moved to 50 different configurations to capture the robot’s movement in various configurations. For each end-effector and camera pose, RGB and depth images and the pose of the reference point on the robot’s end-effector were collected. This process was repeated for all 50 end-effector configurations for each camera pose. The reference point on the robot’s end-effector served as a marker for tracking the robot’s movement and was located at a specific point on the end-effector for consistency.

Table 1 An overview of the time step for continual learning-based approaches in the simulated environment

Continual learning approaches to hand–eye calibration in robots

Abstract

Similar content being viewed by others

Continual Learning for Multi-camera Relocalisation

Solving the robot-world hand-eye(s) calibration problem with iterative methods

On Hand-Eye Calibration via On-Manifold Gauss-Newton Optimization

Explore related subjects

1 Introduction

2 Related work

2.1 Hand–eye calibration

2.2 Deep learning-based hand–eye calibration

2.3 Continual learning

3 Methodology

3.1 Naive approach

3.2 Replay-based CL approaches

3.2.1 The reservoir buffer with class balance

3.2.2 The reservoir buffer with class balance and camera pose selection

4 Simulation experiments

4.1 Data generation and split

4.2 Experimental design

4.2.1 Naive CL

4.2.2 The reservoir buffer with class balance

4.2.3 The reservoir buffer with class balance and camera pose selection

4.3 Experimental results

4.3.1 Naive CL experimental results

4.3.2 Experimental results of the reservoir buffer with class balance

4.3.3 Experimental results of the reservoir buffer with class balance and camera pose selection

4.3.4 Comparison of the CL approaches

5 Real-world experiments

5.1 Data collection and split

5.2 Experimental design

5.2.1 Naive CL

5.2.2 The reservoir buffer with class balance

5.2.3 The reservoir buffer with class balance and camera pose selection

5.3 Experimental results

5.3.1 Naive CL experimental results in the real-world environment

5.3.2 Experimental results of the reservoir buffer with class balance in the real-world environment

5.3.3 Experimental results of the reservoir buffer with class balance and camera pose selection in the real-world environment

5.3.4 Comparison of the CL approaches in the real-world environment

6 Conclusion

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation