1 Introduction

The production in small and medium enterprises (SMEs) has been moving, in recent years, toward small volumes and a high level of customization. To answer the different customer needs, production systems have to be agile and flexible to increase productivity while maintaining a consistent production quality [1]. In manufacturing systems, many products are fabricated through casting processes and to obtain a high-end final product contact-based and surface treatment operations such as polishing are required [2].

In SMEs, many manufacturing steps are still carried out manually by experienced operators, mainly due to the lack of flexible, easy-to-use, and fast-to-configure robotic tools, able to adapt to highly variable environments. In contact-based operations, robotic manipulators have been exploited to reduce production costs and increase productivity. The trajectory generation for a workpiece still represents a bottleneck to fully automate these processes since it is a time-consuming process [3].

In the case of contact-based robotic applications, trajectory planning requires a high level of accuracy, since the robot end-effector has to exert a proper contact force while following the desired trajectory [2]. The CAD model of the workpiece is usually used to generate the geometric path to be followed by the robot tool center point (TCP) like in [4] and [5] where solutions for welding and for polishing are proposed respectively.

When trajectory planning is based on a digital CAD model of the workpiece, operations are carried out under the following assumptions: the production process is executed in a well-defined and constrained environment, and the object to be elaborated is exactly equal to its CAD model and precisely fixed in a constrained place.

However, these assumptions are hardly applicable in SMEs, due to their need to produce and process highly customized products, the CAD model of which is not always available or does not match the actual state of the workpiece. In addition, the work cells in these working environments are very dynamic and the operator often interacts with the process.

To carry out contact-based robotic operations in such an environment, 3D vision sensors are a feasible tool to generate the working trajectories. They allow the development of flexible, fast, and easy-to-configure tools enabling the mapping of the workpiece within the workspace and, consequently, the planning of the trajectories needed to execute the desired task.

In literature, 3D vision sensors have been mainly used for operations not requiring contact during the operation on the workpiece, involving at most contact during the gripping of the component. In [6] and [7], objects grasping solutions that use 3D vision sensors capturing one view 3D image of the robot workspace have been proposed.

Other research works have developed grasping solutions based on multi-view RGBD images of the robot workspace. In [8], a solution based on the reconstruction of the workpiece point cloud using multi-view images is proposed for the estimation of the grasping point in pick and place applications. Using multiple-view RGB-D images of the workpiece, in [9] a neural network-based solution was introduced to reconstruct the point cloud of a workpiece which is used to determine its grasp pose. For the mentioned robotic grasping operations, the efficiency of these grasping solutions is given by the success of the grasping operation.

3D vision sensors have been used for the 3D reconstruction of workpieces for quality monitoring purposes. In [10], a registration algorithm is introduced to reconstruct a workpiece point cloud aligning multi-view point clouds captured using a laser scanner. The accuracy of the output 3D model of the workpiece is given by comparing it to a ground truth model. In [11], the authors propose a point cloud 3D reconstruction for quality inspection using multi-view RGB-D images captured by a moving 3D camera attached to a robot arm. At first, a few images are used to construct an initial 3D model that is then updated: if the model is not complete due to the presence of holes and not a smooth surface, movement commands are generated to capture images of the missing parts.

Considering solutions for robot trajectory planning, fixed 3D vision sensors have been used as in [12] and [13]. In the former, the trajectory to be followed by a spray gun at a distance between 10 and 15 mm from the workpiece is generated. The accuracy of the robotic operation is given by the good execution of the painting task. In the latter, the trajectory to be followed by a glue deposition system is generated to automate a robotic glue deposition in footwear manufacturing.

To automate the trajectory planning for welding applications, solutions based on multi-view images captured by 3D vision sensors have been proposed. In [14], the trajectory for a robotic welding application is generated by exploiting multi-view point clouds. The point clouds are integrated to construct the point cloud of the entire workpiece. In the solution, the assistance of a human operator is necessary to generate the camera-capturing poses based on the use of a hand gesture-based methodology. The accuracy of the generated trajectory is given by comparing it to a ground truth welding trajectory. In [15], multi-view RGB-D images of a workpiece are used to identify its edges and generate the welding trajectory. A comparison between different welding seems detection techniques has been shown.

To guarantee good execution of the contact-based operations, an accurate trajectory and stable contact force exerted over the surface of the workpiece are required. In contrast to the mentioned trajectory generation techniques, in the proposed solution in this paper, the evaluation metric of the contact-based robotic operation is given by the accuracy of the generated trajectory and stability of contact force exerted by the robotic tool during the operation execution despite inaccuracies due to the depth estimation of a low-cost camera.

This paper introduces the steps to automate contact-based robotic operations without a previous digital model of the workpiece based on the use of a low-cost 3D camera and a built-in force sensor. Specifically, we introduced how to integrate RGBD images to construct a 3D model, extract workpieces, and select within the workpiece the area to be covered by the robotic tool to execute the robotic contact-based operation (examples of searching subroutines within the workpiece are shown). The proposed system addresses operations like surface treatment, welding, and material deposition and investigates the possibility to exploit low-cost sensors, based on the assumption that inaccuracies in the defined trajectories can be, to some extent, compensated by the use of a closed-loop force control adopted to guarantee the proper contact force.

Following the generated trajectory in position control mode (i.e., by defining only the poses to be reached by the tool center point) may not indeed guarantee a stable contact force between the TCP and the workpiece surface, even when very accurate trajectories are generated. Closed-loop force control should therefore be applied [3, 5, 13], thereby also compensating inaccuracies in trajectories.

In our previous conference paper [16], we introduced two 3D reconstruction techniques, the odometry-based technique and a technique that calculates the camera poses exploiting the robot poses. The generated 3D models have been evaluated by comparing them to the CAD model of the considered workpiece and considering different KPIs that influence the usability in industrial applications such as elaboration time. The focus of this current paper is to analyze the trajectory generated from these techniques. This is done by explaining, besides the odometry reconstruction algorithm, how to elaborate the 3D model and follow and assess the accuracy of the trajectory that are in common steps for odometry and robot-based technique. Different objects with different geometries, durability, and smoothness are considered in the tests. In addition, technical details of the process parameters configuration, which are common for the two techniques, are shown in Section 3.1 (parameter tuning). Different tests and statistical analyses are done to show the effect of configuration parameters on trajectory accuracy.

The paper is organized as follows: in Section 2, the proposed solution is introduced focusing on the 3D model reconstruction of the observed scene from multi-view RGB-D images, workpiece extraction from the 3D model, and trajectory planning to cover the interested area. In Section 3, the experimental setup used and tests carried out to assess the performance of the proposed solution in common situations of contact-based operations are described. Reference is made to operations in which a pre-set contact force has to be applied on a plane or curved-shaped objects. Finally, in Section 4, conclusions are drawn.

2 Proposed solution

In this chapter, the developed robotic solution is introduced; a summary is shown in Fig. 1.

Fig. 1
figure 1

Pipeline

Step 1, data acquisition, consists of moving the camera following a scanning trajectory to capture RGB-D images of the working space. In step 2, image matching and pose calculations, the gathered images are analyzed to estimate the camera poses from which they have been captured. In step 3, the information contained in RGB-D images is integrated to reconstruct a 3D model of the observed scene. Step 4 consists of the extraction of the workpiece from the entire reconstructed 3D model. For trajectory planning, in step 5, the area to be involved in the manufacturing operation is selected to generate the working trajectory. Step 6 explains the transformation of the trajectory in the robot frame and the execution of it in force control mode. To develop the robotic solution, the system hardware setup shown in Fig. 2 is used. The system is made by a 6-axis UR5e collaborative robot [17], D415 Realsense stereo depth camera [18], and a 3D printed tool used for the assessment of the accuracy of the trajectories generated by means of the reconstructed 3D model, as it will be discussed in Section 3. The developed algorithms are based on the use of open-source libraries such as Intel Realsense Software Development Kit [18] for Intel Realsense 3D camera control and Open3D [19] library for 3D data elaboration and integration. The point cloud elaboration and search routines to select specific areas within the workpiece point cloud have been developed in Python programming.

Fig. 2
figure 2

Setup

2.1 Data acquisition

Sensor setup and data acquisition are the first steps in the algorithm development. The Realsense D415 camera captures color and depth images using different sensors and allows configuring the image quality based on the user’s preferences. In the sensor data sheet [18], sensors can be configured with resolutions of up to 1280 × 720 pixels for depth sensors and 1920 × 1080 pixels for color sensors. Sensor resolution has a significant effect on the output accuracy, and optimal configuration parameters for the considered case are discussed in Section 3.1.1. In these high-resolution configurations, the sensor can capture images at a frame rate of up to 30 frames per second (fps).

Color and depth images are captured by different sensors and could have different resolutions. So, the first step is to transform the two kinds of images to have the same characteristics. The two images are aligned to have the same exact size and geometrically reference the same physical sensor origin. The alignment procedure explained in detail in our previous work [13] allows the alignment of depth to color or vice versa.

To acquire the images that allow for the 3D construction of an object, all parts/sides of the workpiece have to be covered and visible in the multiview images. The scanning trajectory planning can be automated to optimize the visibility of the working area where the workpiece is placed. This process can be considered a 3D multi-goal path planning problem. Where the multiview 3D camera poses are determined to optimize the coverage of certain objects. In [20], a solution is proposed to find the optimal path to guarantee the visibility of a given set of objects.

In this work, the scanning trajectory is planned manually making the following assumption. Supposing that the workpieces are always fixed inside a predefined area within the robot workspace, the scanning trajectory is planned in such a way that the 3D camera is moved and rotated accordingly toward different viewpoints that cover the working area. In these viewpoints, the 3D camera should be always pointed toward the robot working area where the workpiece is placed. The scanning trajectory is made following acquisition movements that start by capturing from top images of the working area then the robot moves the camera to capture images of the different sides. That trajectory is approximately half of a rectangle where three sides of the working area are covered (the side facing the robot base and the two lateral sides). The outer face is not reachable by the robotic setup used due to the size limit of the robot. A robot arm with a higher reach can be used to cover it.

The explained scanning movements are feasible to scan flat and curved objects without cavities and undercuts. For objects with more complicated geometries or having undercuts, the scanning movements can be modified to cover also the undercuts or occluded parts by adding camera views that allow the coverage of these areas. Limitations on workpiece size, position, and orientation are dependent on the used robot’s reach. When a bigger or smaller robot arm is used, a scaling factor can be used to adjust it to cover the new area with bigger or smaller dimensions.

The 3D camera is synchronized to capture the multiview images of the working area while following the scanning trajectory. The total length of the scanning trajectory is divided into steps to define where the 3D camera captures an image of the working area. The step magnitude between an image and the following one, in millimeters, influences the total number of images and also the overlapped parts between them. Very small steps may lead to an excessive number of images covering similar parts of the object, while very big steps may cause partial coverage of the object.

Other factors to be considered to plan the scanning trajectory are the robot movement speed, scanning time, and the 3D camera frame rate. The relationship between all the mentioned factors can be described by the following system of Eq. 1:

$$\left(\begin{array}{l}n=\frac{l}{\Delta l}\\ {V}_{r}=\frac{l}{t}\\ fps=\frac{n}{t}\end{array}\right.$$
(1)

where \(n\) is the number of total RGB-D images in the dataset, \(l\) is the length of the scanning trajectory, \(\Delta l\) is the scanning step, \({V}_{r}\) is the robot movement linear speed, \(t\) is the scanning time, and \(fps\) is the frame rate of the 3D camera.

The RGB-D elaboration algorithm is based on the technique provided in [21] where the authors suggest having a maximum number of images of 100 images. Knowing the length of the scanning trajectory and choosing a scanning time to satisfy production requirements allows for calculating the other variables which are robot linear speed and camera frame rate. Figure 3 shows a practical example of a scanning trajectory, in which a scanning trajectory length of \(l=1000 mm\) and dataset size equal to \(n=100\) images are used. The figure shows the coordinate systems representing the camera pose at each RGB-D image acquired.

Fig. 3
figure 3

Acquisition trajectory to capture multi-view RGB-D images of a box

From the first equation of 3, the scanning step is \(\Delta l=10 mm/frame\). Supposing that a feasible scanning time is \(t=10 s\), from the second equation of 3, the robot speed is \({V}_{r}=100 mm/s\). From the third equation of 3, the 3D camera frame rate is \(fps=10\;frames\;per\;seconds\).

2.2 Images matching and camera pose estimation

The color and depth images contain different information about the observed scene, as they have been captured from different view angles. To create a unique 3D model, it is necessary to refer each image to a common reference frame and then integrate the information contained in the images together to create the 3D model. To calculate the motion between the camera poses capturing the RGB-D images of the observed scene, the RGB-D odometry technique [22] is used. This technique, comparing two images captured by a moving camera of a static scene, determines the homogeneous transformation matrix that if applied to the second image maps it to match the first one. In our previous paper [16], the information of the end-effector poses from which each image is captured has been exploited to integrate all the images to build the 3D model. In the present paper, we discuss more in deep the possibility to reconstruct the camera poses by exploiting the odometry technique, which might be a useful alternative in case the poses from which the image are taken are not available (e.g., manual free scanning).

The RGB-D odometry technique [21] determines the camera motion or the homogeneous transformation matrix by solving an optimization problem. The objective function used in the optimization problem is shown in Eq. 2.

$$E\left(T\right)=\left(1-\sigma \right){E}_{p}\left(T\right)+\sigma {E}_{g}\left(T\right)$$
(2)

where \(E\left(T\right)\) is the objective function that is calculated considering two terms. The first term \({E}_{p}\left(T\right)\), to be maximized, considers pixel photo-consistency [23, 24] (if the same pixel is visible in two images, it has to be represented by the same color and brightness in both of the images) [25]. It is represented as squared differences of pixel intensities. The second term \({E}_{p}\left(T\right)\) term, to be minimized, is a geometric objective function that measures the error of pixel positions. \(T\) is the homogeneous transformation matrix to transform an image to the coordinate system of the other image. The result of the optimization problem is the optimal \(T\). \(\sigma \in [\mathrm{0,1}]\) is a weighting term to balance the two terms.

Like most iterative closest point registration algorithms, the one used needs an approximated initial value [26] of the transformation matrix. This initial value is usually obtained by a global coarse registration algorithm based on the point cloud features. Controlling the step length, \(\Delta l\), in Eq. 1 allows having small motion between consecutive images and having a large overlapped portion. A feasible initial value of the homogeneous transformation matrix is the identity matrix that is modified at every iteration to find the best solution minimizing the error of pixels color and position.

The camera reference frame, corresponding to the pose from which the first RGB-D image is captured, is considered the common reference frame to which all images must be referred. The RGB-D Odometry algorithm calculates camera motion between only two images. If the two images do not have enough overlapped pixels, this may cause low accuracy in calculating the motions. To match all images of the dataset, each image is compared to the following one, to find the matching homogeneous transformation matrix between them. The calculated individual matrices, representing camera motions between consecutive images, are finally used to calculate for every image a homogeneous transformation matrix of the cumulative motions to refer to the first image.

The procedure is explained in Algorithm 1. The procedure input is the dataset of RGB-D pairs of images of the working area. The output is an array of homogeneous transformation matrices that aligns every image to the first image of the dataset. \({H}_{aux}\) is computed considering the objective function in Eq. 2 and applying optimization procedure from [21]. \({H}_{k}\) is the homogeneous transformation matrix that transforms the \({k}_{th}\) RGB-D image to the coordinate system of the first image. It is given by the cumulative motion of the camera, starting from the first image until arriving to the camera pose from which the \({k}_{th}\) image has been taken.

Algorithm 1
figure a

RGB-D odometry

2.3 Volume integration for 3D model construction

All the multiview color and depth images of the working area are integrated together to create a unique 3D reconstruction representing the observed scene. The volume integration process, introduced in [27, 28], is based on the generation of a voxel grid representing the scene, in which color and depth information of each pixel of each RGB-D image are integrated and referred to a global reference frame. The voxel grid is therefore a three-dimensional representation of all the information contained in the two-dimensional color and depth images. From the voxel grid, after applying the truncated signed distance function (TSDF), the isosurface representing the scene is found. The isosurface is the smooth and continuous surface interpolating the not empty voxels in the grid.

The procedure consists of the following steps. A dataset of (\(1,\dots ,n\)) RGB-D images is discretized in a voxel grid. Then, calculate the signed distance functions (\({s}_{1}\left(x\right),\dots ,{s}_{n}\left(x\right)\)) of each voxel \(x\) in the \({i}_{th}\) RGB-D image. These values represent the distance that a voxel has with respect to the nearest surface along the camera’s line of sight. Voxels between the observed surface and the camera origin have a positive value that increases as it goes closer to the camera. Voxels of the observed surface have null values while not visible points have a negative distance.

The depth measurements are subject to noise and two depth measurements of the same point, using the same 3D camera and from the same point of view may have different values. To estimate better the observed surface depth measurement, the signed distance values from the different images are averaged to calculate a cumulative signed distance function \(S(x)\). To consider that the same point in two images that are taken from different view angles may have different values, more weight has to be given to the depth value in the image that covers better that point. Weight values (\({w}_{1}\left(x\right),\dots ,{w}_{n}\left(x\right)\)) of each voxel \(x\) in the \({i}_{th}\) RGB-D image. This weight measures the degree of certainty of the depth measurement of a point. Weight value assignment is dependent on the orientation of the surface normal vector and the viewing angle. If the camera line of sight is co-directional, the weight is the highest and decreases with the increase of the angle between them. This relation can be represented as the dot product between the two vectors.

The weighted integration of all RGB-D images is given by the following Eqs. 3 and 4. Where \(S\left(x\right)\) is the global signed distance value for every voxel of the integrated scene, and \(W\left(x\right)\) is the global weight for every voxel. Discretizing the functions \(S\left(x\right)\) and \(W\left(x\right)\) in a voxel grid allows calculating the zero-crossing or the isosurface having \(S\left(x\right)=0\) that describes the observed scene.

$$S\left(x\right)=\frac{\sum_{i=1,\dots ,n}{s}_{i}\left(x\right){w}_{i}\left(x\right)}{\sum_{i=1,\dots ,n}{w}_{i}\left(x\right)}$$
(3)
$$W\left(x\right)=\sum_{i=1,\dots ,n}{w}_{i}\left(x\right)$$
(4)

From practical tests, the grid voxel size can affect the accuracy of the reconstructed 3D model. In Fig. 4, a comparison between voxel sizes and a color image of an object is shown.

Fig. 4
figure 4

Voxel size effect on points density

In Fig. 4a, a color image from the dataset used for the 3D reconstruction is shown. In Fig. 4b, the point cloud considering a big voxel size is shown, while in Fig. 4c, the point cloud using a smaller voxel is shown. The smaller voxel size allows reconstructing a 3D model with a higher level of details and hence more degree of similarity with respect to the real object.

2.4 3D model elaboration and trajectory generation

The result obtained from the integration process is the 3D model of the observed scene in the form of a point cloud. The information contained in the point cloud for every point is the coordinates with respect to the reference frame (i.e., the one corresponding to the camera in the first image), the color in the form of RGB values, and the normal vector. In the considered setup, the xy-plane is parallel to the workbench where the robot and the workpiece are placed. The point cloud z coordinates represent the distance of each point with respect to the workbench, with higher z values corresponding to more distant points. These values have been exploited to develop the searching algorithm, aimed at identifying and selecting the area of the workpiece to be considered in the contact-based operation. The procedure is described in Algorithm 2.

Algorithm 2
figure b

3D model elaboration and trajectory generation

After the reconstructed 3D model is built, combining the color and depth images of the scene covered by the camera during the scanning process, a filtering process has to be applied to remove irrelevant points in the reconstructed 3D model. The point cloud can be trimmed by removing points far from the camera representing unrelated background points. In Algorithm 2, the filtering process consists of the evaluation of the depth value or the distance that a point has with respect to the working table on which the robot is placed. If the distance is greater than a predefined threshold, the point is excluded from the point cloud.

In order to find the workpiece in the remaining part of the point cloud, a plane segmentation process is applied to find the working table on which the workpiece is placed. Clusters of points forming a plane are found (neighboring points having depth in a defined range, i.e., 2 mm, this value considering inaccuracies of the 3D camera in calculating the depth of the flat surface of the working table). The points forming the working table can be found by searching for the biggest cluster of the planes found. Removing these points allows removing the working table from the overall point cloud, leaving only the workpiece. Plane segmentation is done by using a RANSAC search algorithm. RANSAC algorithm is an iterative algorithm that samples a subset of the dataset and uses them to calculate a fitting model (in the considered case a plane). All points of the dataset, are then checked to evaluate if they fit in the model. If the number of points that fit in the model is lower than a defined threshold, the process is repeated sampling other points and calculating a new plane model. This is repeated until finding the model that fits most of the points in the dataset. To configure the plane search algorithm, three parameters are defined. Distance error is the maximum distance that a point has to be considered as a part of the plane. The number of points that are randomly sampled from the dataset to estimate the plane. Finally, the number of iterations of the algorithm to sample points, estimate, and verify the plane. Values used are \(Error=2 mm\), \(points=3\), and \(iteration=1000\).

The result is the plane equation coefficients \(a\), \(b\), \(c\), and \(d\) satisfying, for every point of the plane having the coordinates \({x}_{p}\), \({y}_{p}\), and \({z}_{p}\), the plane equation \(a{x}_{p}+b{y}_{p}+c{z}_{p}+d=0\).

Once the plane equation is found, the points forming it are selected and excluded from the point cloud to leave only the workpiece points. Knowing the dimension of the working area where the workpiece is placed and how it is positioned with respect to the camera coordinates and hence the origin of the point cloud, the coordinates of the remaining points of the point cloud are checked to verify if they are found within the working area. Upper and lower bound values are used for that check. Points out of the bounds of the working area are excluded from the point cloud. The considered points are those satisfying the conditions:

$$\left(\begin{array}{l}{x}_{low\;bound}<{x}_{p}<{x}_{upper\;bound}\\ {y}_{low\;bound}<{y}_{p}<{y}_{upper\;bound}\end{array}\right.$$
(5)

where \({x}_{p}\) and \({y}_{p}\) are the point coordinates, and \({x}_{low\;bound}\), \({x}_{high\;bound}\), \({y}_{lowbound}\), and \({y}_{high\;bound}\) are the boundaries of the working area in the camera coordinates.

To avoid having sparse outlier points that are far from the points of the workpiece due to the sensor or the 3D reconstruction process noises, a statistical outlier removal filter is needed. The filter removes the points far from the average of the points. It is based on the calculation of the mean distance value between the point and its surrounding n-neighbors. Making an assumption of Gaussian distribution of the result, points that have value out of a predefined interval (in terms of average distance and standard deviation) are excluded.

Different subroutines are developed to select the part of the workpiece to be covered in the operation. An example of selecting a specific part of the workpiece could be when a central area should be covered. An example of this case is shown in Fig. 5c. To select that area, the coordinates of the points found after outlier removals are used. The first step is to calculate the centroid of the set of points considering the x and y coordinates in the following way. For a set of points \({P}_{1},\dots ,{P}_{n}\) with the coordinates ((\({x}_{1},\dots ,{x}_{n}\)),(\({y}_{1},\dots ,{y}_{n}\))), the centroid is given by Eq. 6.

Fig. 5
figure 5

Point cloud elaboration, object extraction, and trajectory generation. a Point cloud of the reconstructed scene, b extraction of workpiece, and c trajectory generation

$$centroid=\left(\frac{{x}_{1}+{x}_{2}+\dots +{x}_{n}}{n},\frac{{y}_{1}+{y}_{2}+\dots +{y}_{n}}{n}\right)$$
(6)

After calculating the centroid of the points, offsets on the x and y-axes are used to define the area dimension to be selected. The points within the limits of that area are selected, and the rest are excluded.

In other cases, when the operation targets the edges of the workpiece, for example, in polishing operations, the edges can be found elaborating the normals of the workpiece point cloud and selecting the area having neighboring points with huge variation in the directions of the normals [29].

The selected portion of the workpiece is made by a high number of close points; before the generation of the working trajectory, it is necessary to downsample the point cloud. This allows to the removal of close points and has a minimum distance between the consecutive points that helps the robot controller to interpolate better the movement commands and in the applications considered to have good contact force control.

Based on the operation to be executed, the order in which the selected points are followed can be changed, and parameters like movement speed and contact force have to be defined to generate the trajectory that guarantees a good execution performance.

Figure 5 shows the process of 3D model elaboration to generate the trajectory for the contact-based operation.

In Fig. 5a, the overall point cloud of the observed scene including the working table and the workpiece is shown. In Fig. 5b, the workpiece considered in the contact-based operation is shown, and in Fig. 5c, the trajectory targeting the central area of the upper surface of the workpiece is shown.

In other cases, the operation may target a certain one of the lateral faces of the workpiece. An example is shown in Fig. 6. In Fig. 6a, the left face of the workpiece is selected; in Fig. 6b, the front face is selected; and in Fig. 6c, the trajectory covering the right part is selected. It is possible to note that the normal vectors are available for every point. Comparing the inclination variation in the neighboring points, it is possible to detect also the edges of the point cloud. That is useful for polishing operations.

Fig. 6
figure 6

Example of searching routine: workpiece lateral faces selection

Having developed the integration algorithm based on the elaboration of color and depth images, the 3D model of the scene preserves also the color information. This process also allows selecting parts of the object and searching for areas having a specific color. This is useful in the case of surface finishing and cleaning. In Fig. 7, an example of a workpiece with a colored spot is shown. In the left of Fig. 7a, the workpiece is shown. In the middle of Fig. 7b, the generated trajectory covering the colored spot is highlighted in the blue circle. In Fig. 7c, the robot end-effector is shown while following the trajectory.

Fig. 7
figure 7

Example of searching routine: workpiece with colored spot

2.5 Trajectory transformation and force control configuration

The part of the workpiece to be covered by the robot end-effector is represented in the form of a point cloud that is referred to the camera reference frame taking the first RGBD image. To drive the robot end-effector, to touch the trajectory points, a homogenous transformation matrix \({H}_{CT}\) is calculated. The matrix \({H}_{CT}\) is calculated by using information about the geometrical details of the 3D camera support attaching it to the robot end-effector, the captured images’ origin with respect to the 3D camera frame, and the robotic tool dimension. The calculated homogeneous transformation matrix transforms the trajectory poses from the camera frame \(C\) to the robot end-effector frame \(T\) shown in Fig. 2.

In contact-based robotic operations, to obtain consistent outcomes, a constant force is applied on the workpiece surface. The generated trajectory, referred to the robot tool reference frame, is followed in force control mode along the axis \({Z}_{T}\) that is pointed toward the workpiece surface. The force control loop adjusts the z-axis value for each point of the trajectory moving it upward or downward to keep the contact force as stable as possible and close to the contact force desired setpoint. Universal robot manipulator is equipped with a built-in force sensor and a built-in PID force controller that is used in the considered experiments.

The force control is used for the accuracy assessment of the generated trajectory. A stable contact force during the trajectory following means that trajectory inaccuracies (e.g., depth estimation of the 3D camera) have been compensated. A comparison between the generated trajectory and the one actually followed is used for the accuracy evaluation.

Several workpieces have been considered for experimentation (e.g., cardboard box, 3D printed flat object, 3D printed curved object, plastic container). The characteristics of the workpiece such as smoothness and durability highly influence the contact force behavior and error compensation.

3 Experimental setup and tests

In this section, a set of tests and experiments are shown to evaluate the performance of the developed algorithm. We consider cases that are mostly to be found when contact-based robotic operations have to be executed over an object without any previous knowledge. The cases considered are operations on plane objects, operations on curved objects, and operations where it is necessary to apply proper contact force to guarantee a correct execution. Also, we consider the effect that some parameters could have on the accuracy of the output of the developed algorithm, such as camera resolution to capture the RGB-D images and voxel size considered during the image integration.

The setup used during the experiments is the one shown in Fig. 2. In order to assess the accuracy of the trajectory generated based on the reconstructed 3D model, the 3D printed robotic tool in the figure is moved toward the reconstructed object, following the generated trajectory. When the tool tip touches the object, the actual position is compared to the theoretical one generated from the 3D model of the object, and the difference in the actual and theoretical position is used to calculate the error. This measurement is directly influenced by the rigidity of the workpiece. In order to overcome this limitation, the end effector tip is stopped as soon as the load cell on the robot wrist starts measuring a contact force. Moreover, different objects with different rigidity are tested, i.e., a cardboard box with slightly low rigidity and poly lactic acid (PLA) 3D printed objects with higher rigidity. This experiment is done by exploiting the Universal Robot UR5e built-in force-torque sensor [17].

As a first test case, consider the one in which a robotic contact-based operation has to be executed over the center of the upper face of the cardboard box workpiece in Fig. 5. The working trajectory, selected from the workpiece point cloud and generated according to Section 2.4, is the one highlighted in red in Fig. 5c.

Applying force control and, in particular, a touch stop function in which the robot stops as soon as touches the workpiece, every point of the trajectory is reached separately, thus allowing the evaluation of its accuracy. Figure 8 shows a comparison between the trajectory points generated from the 3D model (blue line) and the tooltip feedback position obtained when it reaches the workpiece’s surface at each point of the trajectory (actual trajectory depth, orange line). The lines in the graphs are made by connecting the values measured reaching every single point of the trajectory separately. The values indicate the height of the object at every point reached. The dashed line reports the mean values of the trajectory points of both the theoretically generated and the actually executed trajectory, highlighting a mean error of about 1.1 mm. The maximum discrepancy between the two curves is equal to 3 mm, and the minimum is 0.16 mm, as reported in Fig. 8b.

Fig. 8
figure 8

Plane surface depth accuracy evaluation

The values extracted from the reconstructed 3D model (blue line in Fig. 8a) have a relevant variance between consecutive points, due to both the 3D vision sensor accuracy in estimating the depth values and the voxel size considered in reconstructing the 3D model. However, as further commented in the following, this can be acceptable in contact-based applications where a certain geometrical error can be compensated thanks to the force control closed loop, keeping the desired contact and thus guaranteeing the correct trajectory.

3.1 Parameters tuning

3.1.1 Camera acquisition accuracy

The Intel Realsense D415 has several resolution configurations for color and depth sensors to be set by the user. To obtain the best resolution of the depth sensor, it is recommended to use 1280 × 720 pixels.

Changing the resolution settings has a direct effect on the minimum allowed distance between the depth sensor and the observed scene. This distance is the one at which the depth processor starts to measure the depth values. For the maximum resolution of 1280 × 720 pixels, the minimum distance is 450 mm between the sensor and the scene. Decreasing the resolution to 848 × 480 pixels, the minimum distance decreases to 310 mm.

In the application considered, where the D415 depth camera is moved by the robot arm to capture the images of an object, a required minimum distance influences the maximum height that the scanned workpiece can have since the height of the extended robot arm is limited.

Tests are carried out to compare the resolution configurations used to capture the multi-view images. The resolutions considered are 1280 × 720 and 848 × 480. The 3D model reconstruction is made considering a voxel size of 5 mm. For each configuration, 10 different datasets of RGB-D images are acquired to assess the repeatability and consistency of the proposed method results. These images are used to reconstruct 10 3D models of the same object that are used for trajectory generation, execution, and error measurement using the force control method (touch stop) explained at the beginning of this chapter. The output of each of the 10 tests is similar to the two figures in 8b, where the error for each point in the trajectory is calculated and these individual errors are used to calculate the overall error mean value of the trajectory.

Results are shown in Fig. 9, in which the blue dots correspond to the error mean values of each of the generated 10 trajectories. The error mean values for lower resolution is fluctuating between 1.5 and 1.72 mm. For higher resolution, the error is between 1.15 and 1.3 mm.

Fig. 9
figure 9

Error mean value changing camera resolution

Figure 10 reports the statistical analysis of the trajectories generated, considering every acquisition of the 10 tests for both the lower (Fig. 10a) and higher resolution (Fig. 10b). Data distribution is shown in terms of quartiles, median, maximum, and minimum values.

Fig. 10
figure 10

Statistical analysis summary

Increasing the resolution improves the accuracy of the generated trajectory but with the drawback of decreasing the maximum height of the workpiece.

3.1.2 Voxel size effect on Integrated volume using truncated signed distance function (TSDF)

The estimated surface describing the observed scene is generated by transforming the depth values stored in a voxel grid in an isosurface as explained in Section 2.3. The accuracy of that surface is dependent on the dimension of the voxels forming the grid, as voxel values are obtained by averaging depth values from the depth images.

A bigger voxel dimension means that depth values contained in a bigger area are averaged to calculate a single value representing the cell of the grid. Hence, the output value is far from the real value. Instead, smaller size leads to considering the depth values of a smaller area and hence more accurate approximation.

Figure 11 shows the errors evaluated for two different voxel sizes adopted in the reconstruction of the 3D models from which trajectories are extracted. Results correspond to the highest resolution of 1280 × 720 pixels. For the bigger voxel size of 5 mm, whose 3D model reconstruction is shown in Fig. 4b, higher errors are obtained, both in mean value and maximum value (maximum error up to 1.35 mm). On the other hand, for the smaller voxel size of 1.3 mm, whose 3D model reconstruction is shown in Fig. 4c, lower errors are obtained (maximum error being equal to 0.8 mm).

Fig. 11
figure 11

Voxel size effect on object height error

3.2 Performance evaluation

In this section, the performance of the generated mesh model is evaluated, considering surface geometries that are mostly to be found when contact-based operations are executed (i.e., plane and curved shape). The parameters used for both data acquisition and 3D model reconstruction are the optimal ones as emerged from the analysis of the previous paragraphs, i.e., camera resolution 1280 × 720 pixels and voxel size 1.3 mm.

3.2.1 Depth evaluation of a plane shape workpiece

The object shown in Fig. 4a is considered an example of an object having a plane surface on which the contact-based operation has to be executed. The procedure is the one explained before. In which, the end-effector is driven toward each of the points, one by one, belonging to the generated trajectory (covering the central area). The object used has a higher rigidity with respect to the object shown in Fig. 5, and results are shown in Fig. 8.

The object has a known height of 173 mm. In Fig. 12, the evaluation results are shown. The blue line represents the height extracted from the reconstructed 3D model. The orange line represents the real height calculated from the measured robot TCP position, reaching every point of the trajectory. The average error is less than 0.5 mm, and the maximum one is equal to 0.8 mm.

Fig. 12
figure 12

Height comparison of a plane object

3.2.2 Depth evaluation of a curved shape workpiece

This section focuses on the capability of the developed code to define a robot trajectory to be followed on a curved surface, which is another type of shape mostly to be considered in contact-based operations. In order to carry out the analysis, the curved object represented in Fig. 13a is adopted. Figure 13b reports the corresponding 3D model reconstructed. The points defining the trajectory to be tested on the curved surface are highlighted in red.

Fig. 13
figure 13

Curved object and trajectory generated

Figure 14 shows the test results. Figure 14a shows the comparison between the estimated height (blue line) and the actually measured height (orange line).

Fig. 14
figure 14

Curved object results

Figure 14b shows the corresponding error, having a mean value of 1.85 mm, minimum value of 1.2 mm, and maximum value of 2.2 mm.

In comparison with the values achieved for the plane surface, the reconstructed 3D model of the curved object has a lower accuracy.

3.2.3 Contact force behavior evaluation

In most of the contact-based robotic applications (e.g., polishing and surface treatment), the contact force applied by the tool on the workpiece surface must be kept rather close to the target value to guarantee the proper execution of the task. For this reason, the use of a force control loop is highly recommended in these applications, possibly exploiting a load cell on the robot’s wrist. In such a case, the activation of the force control also enables compensation of the error in the generated trajectory, so that the low accuracy/low-cost sensor exploited in this work can be accepted, with a reduction in the overall cost of the application.

As a final step of validation, this subsection presents the tests executed with a force control loop, following the trajectories generated and discussed in the previous sections. Different contact forces are applied on the plane surface object of Fig. 4a, with the purpose to demonstrate that the low-cost sensor adopted in this work and the described procedure are suitable for generating trajectories accurate enough to guarantee the target contact force.

Figure 15 shows the contact force measured during the execution of the zigzag trajectory of Fig. 5. The same working trajectory is followed by applying different contact force set points (2, 5, and 10 N).

Fig. 15
figure 15

Contact force analysis applying different values

In Fig. 15, the analysis of the force values is shown. The figure shows that the mean value in the three cases is equal to the set point, with some contact loss only in the case of the 2N average force.

4 Conclusion

This paper discusses the possibility to exploit low-cost cameras for trajectory planning in robotic contact-based applications, in which the 3D model is not available in advance.

The proposed methodology is based on the integration of multi-view RGB-D images of the workpiece captured using a low-cost 3D camera. Compared to solutions exploiting more accurate but much more expensive sensors like laser scanners, this solution can be more feasibly exploited in small and medium enterprises, thanks to the low cost. It would also allow the quick reconfiguration of the robotic cell. Color and depth images are integrated to generate the 3D model without previous knowledge of the workpiece so that trajectories for the robot end effector can be generated in the absence of the CAD model of the workpiece. The information of camera pose can be either directly used for the 3D reconstruction [16] or estimated based on the RGB-D odometry technique. In this case, the presented methodology can be also used for free scanning in case the exploitation of a robot is not possible or can be used for quality monitoring of workpieces.

Experimental results show how to tune relevant parameters that may improve the accuracy of the reconstructed 3D model and hence the trajectory. The experimental tests also showed that the proposed solution could be properly applied over planar and curved workpieces obtaining error between generated ad actually followed trajectory \(\le\) 2 mm.

The exploitation of a contact force control to keep a proper contact force between the tool and the object surface allowed for compensation of the positional errors, in the order of a few millimeters, between the generated and the actual required trajectory.

The scanning trajectory in the proposed solution is chosen offline without consideration of the workpiece form. Future work is to use an adaptable trajectory that based on the reconstructed 3D generates viewpoints to cover undercuts, cavities, or occluded parts of the workpiece.