1 Introduction

The use of stay-in-place formworks is one of the ways to reduce costs in construction and cut down on one-time concrete forms, which produce extensive material waste (Lloret-Fritschi et al. 2020). Stay-in-place formworks function as a structural reinforcement and may also serve aesthetic purposes (Hack et al. 2017), allowing for design freedom through free-form and complex shapes of the built structures. One application example is concrete spraying or shotcreting, where rebar meshes are used as a basis. It is a highly industrialized construction method, used primarily in modern tunneling and slope reinforcements, either in a manual or semi-automatic mode (Fig. 1). The high-velocity application of concrete results in a highly consolidated material (Beaupre 1994). Recent advances in robotics in the area of architecture and civil engineering pushed the development of robotic spraying, where the goal is to automate the process as much as possible. Examples include spraying of plaster on flat walls (Jenny et al. 2020) or concrete onto carbon fibre or reinforcement meshes (Taha et al. 2019) (Fig. 1). In this paper we focus on the latter. To make the process automatic, the first step is to detect the overall location and shape of the mesh. This step is crucial for the spraying process and provides an input for spray trajectory generation. Accurate spray path trajectories are needed for accurate material application, which needs to be done at specific constant spraying distances, as well as to avoid collisions between the robot arm and the mesh.

Fig. 1
figure 1

Manual spraying onto a rebar mesh during skate park construction, Spohn Ranch©(left). Wall reinforcement with semi-automatic spraying using a shotcrete machine robot, SIKA©(middle). Robotic concrete spraying setup (Taha et al. 2019) (right)

Particular challenges are, e.g., handling of production errors (e.g., material fall down, insufficient layer thickness) or of unknown or imprecise geometries of work pieces (Wangler et al. 2016; Studio 2017). One of the solutions is to include adequate equipment, sensors, data processing, and feedback control in the setup, which integrate real-time (geometric) information into the fabrication process. State-of-the-art robotic processes in digital fabrication thus include sensors, such as depth cameras, laser scanners, or stereo camera systems. However, we think their use is not yet sufficiently scrutinized (Schuler and Sawodny 2019; Wang et al. 2020).

One of the ways to digitally reconstruct thin objects, such as rebar meshes, is to scan them using laser scanners. However, the acquired data are usually influenced heavily by mixed pixels, a bias occurring when the laser beam hits surfaces at different distances simultaneously (Chaudhry et al. 2021), i.e., the rebar structure and the background in this case. Kim et al. (2020) proposed an approach of automatic rebar dimensional inspection based on terrestrial laser scanning (TLS), by implementing a noise reduction and mixed pixel removal algorithm. Furthermore, Wang et al. (2017) makes use of both 3D and color features to train a support-vector machine classifier that extracts the points corresponding to the rebars. This approach substantially relies on the information provided by RGB images, which represents a further major disadvantage and will be discussed in the next paragraph. The topic of scanning wire-objects is also under research for applications, such as inspection of power lines using mobile mapping data, see, e.g., Sánchez-Rodríguez et al. (2019). The approaches using TLS data are associated with challenging registration of data sets acquired from multiple views, occlusions, which are especially problematic in dense rebar mesh layouts, and computationally intensive, time-consuming processing of large data sets. In addition, some of the mentioned approaches rely on prior information for the reconstruction of objects, e.g., the circular shape of the wire and its diameter, meaning when the wire specifications change, the user has to adapt the processing code. When acquisition is done from a larger distance, the scanner resolution and the number of points on the wire may result in further challenges. Overall, scanning of rebar meshes is not yet a solved problem.

Another way to extract rebar meshes is using (stereo) RGB camera systems. An example of such a system was developed in the mesh mould project (Giftthaler et al. 2017), where the position of the steel wire important for the next construction step had to be identified, such that the robot tool can clamp onto it. Damage inspection methods for steel-reinforced concrete structures are another research domain, where extraction of rebars is relevant. A few approaches exist in the literature (Xu et al. 2019; Miao et al. 2021), where areas with exposed rebars are detected using single RGB images. With these approaches it is possible to detect faulty areas by identifying the shape of the rebar wire in the image as well as using other cues of the exposed and damaged concrete structures. The major disadvantage of all RGB image-based methods is that they depend heavily on the external illumination. This significantly reduces the measurement robustness for applications in challenging environments with changing illumination conditions, such as construction sites or fabrication workshops. Furthermore, in the case of stereo systems, the need for a common field of view of the two cameras and the occlusions reduce the flexibility of the method.

In this paper, we build on top of the two mentioned ways of rebar detection: we apply image processing steps to depth images for 3D rebar mesh digitization. By this we are able to cut down on long processing times, as well as mitigate the mixed pixel effect on the final results. The starting point for the acquired depth images is an approach known in computer vision for applications, where lines have to be extracted in near real time, e.g., for road line extraction (Deng and Wu 2018) or automatic Sudoku solving with grid recognition (Kamal et al. 2015). We use commercially available time-of-flight (ToF) depth cameras to acquire the geometric data with an accuracy of a few mm over distances of up to 6 m, based on an operating principle shown in Hansard et al. (2012). The 3D observations are a result of an active illumination of the environment using a near infrared light source, making the acquisition largely independent of the external illumination.In addition, the measurements are done from a single view point, which eliminates the common-field-of-view challenge, as is the case in the stereo systems.

Registration or alignment of point clouds, acquired at various scanner locations requires a set of transformation parameters, i.e., translations and rotations (Friedli and Wieser 2016). In a robotic setup, where sensors can be mounted on the robot arm, this challenge can be solved by calibrating the system and determining the hand–eye (HE) calibration parameters (Wang et al. 2020). Several approaches exist in the literature, where a checkerboard target is acquired by either RGB or depth cameras. Both types of cameras yield intensity in addition to 3D information (Tsai and Lenz 1989; Avetisyan et al. 2014; Wu et al. 2019). The purpose of a 2D checkerboard target is to provide a stable reference between various camera acquisitions each of which allows estimating the respective position and orientation of the camera (Albarelli et al. 2009). Furthermore, approaches of HE calibration exist, where a 3D calibration field is used, for cases when only 3D information is obtained with a depth camera (Yang et al. 2018; Kahn et al. 2014). Those are generally less accurate than the image-based solutions, because of the lower accuracy of the 3D sensors compared to the achievable accuracy with RGB cameras.

The main contributions of our paper are (i) a pipeline for accurate and efficient digital reconstruction of rebar meshes including an experimental evaluation of the accuracy, and (ii) an innovative sequence of steps for detecting the mesh in space within a robotic setup. The pipeline for rebar mesh reconstruction is depicted in Fig. 2, indicating two inputs to the process, i.e., the depth images acquired with the depth camera and the robot poses recorded by the robot arm control unit. The depth images are used for extraction of the grid points of the mesh object and then georeferenced using robot pose information complemented by the hand–eye calibration parameters. The output of the pipeline is a digitally reconstructed mesh object. Overall, our proposed approach offers a fast, autonomous, accurate, and robust solution, which can be used for detecting and reconstructing complex double-curved rebar structures or other structures composed of linear elements. Further contributions are the demonstration of an innovative application of machine vision and point cloud data processing to digital fabrication, and the demonstration of the potential of state-of-the-art depth cameras to support robotic applications by contact-less surface digitization with mm-level accuracy.

Fig. 2
figure 2

Acquisition and processing pipeline overview

The calibration of the robotic setup to obtain HE transformations and its evaluation are shown in Sect. 2. In Sect. 3 we assess mixed pixel influences on the acquired scans of rebar meshes. The proposed approach of depth image processing is described in detail in Sect. 4 and the accuracy assessment in Sect. 5. The sequence of steps of detecting the mesh structure within a setup is described in detail in Sect. 6.

2 System calibration

When depth cameras are rigidly mounted on a robot arm, the acquired sensor data at various robot poses can be directly georeferenced into a common coordinate system (CS). However, to be able to do this, one must determine the relation between the internal coordinate system (ICS) of the camera and the tool coordinate point (TCP) of the robot. This can be achieved by a so called HE calibration, by, e.g., making use of one of the existing methods from the literature (see Wu et al. 2019 for comparison between various methods). In this paper, we use the Tsai calibration method (Tsai and Lenz 1989), selected because of its high accuracy and computational speed (Wu et al. 2019; Marchand et al. 2005), which are relevant for the quasi-real-time application presented herein.

A checkerboard target (in our case A1 format with 65 mm grid size), placed in the scene is acquired at 20 robot poses; however, at least six are required to solve for the HE calibration transformation matrix. These poses are randomly distributed in terms of their orientation and position on a half-sphere centered around the center of the target. At each pose, a data set is acquired consisting of a recorded TCP position and orientation in the robot base coordinate system (RBCS), i.e., \(\mathbf {M}_{TCP}^{RBCS}\), and an intensity image of the checkerboard target used for estimation of \(\mathbf {M}_{ICS_j}^{Target}\), where j is the camera index (see Fig. 4, showing only two of the four cameras). An example pose during the HE calibration at our setup is shown in Fig. 3. The robot used is an ABB IRB 60/2.05 which has a specified pose repeatability of 0.06 mm and pose accuracy range of 0.5 mm to 1 mm (ABB Robotics 2021). This setup is used throughout the whole paper. We estimated the transformation matrix \(\mathbf {M}_{ICS_j}^{Target}\) based on modelling the extrinsic camera orientation using the OpenCV Python library (Bradski 2000). The redundant observations improve the solution and allow for elimination of gross errors. Having this information, one can solve for the three unknowns of the rotation matrix \(\mathbf {R}\) and the three unknowns of the translation vector \(\mathbf {t}\) contained in the \(\mathbf {M}_\mathrm{TCP}^{\mathrm{ICS}_j}\), using the relation

$$\begin{aligned} \mathbf {M}_{ICS_j}^{Target} \cdot \mathbf {M}_\mathrm{TCP}^{\mathrm{ICS}_j} = \mathbf {M}_{RBCS}^{Target} \cdot \mathbf {M}_\mathrm{TCP}^{RBCS}. \end{aligned}$$
(1)
Fig. 3
figure 3

Image showing one of the poses during the HE calibration procedure

Fig. 4
figure 4

Diagram of various CSs within the robotic setup relevant for the HE calibration procedure. Only two out of four cameras are visible on this diagram

Now, all acquired data sets with the camera can be expressed in the RBCS using

$$\begin{aligned} \mathbf {b}_i = \mathbf {R}_\mathrm{TCP}^{\mathrm{ICS}_j} \cdot \mathbf {p}_i + \mathbf {t}_\mathrm{TCP}^{\mathrm{ICS}_j}, \end{aligned}$$
(2)

where \({\mathbf {p}_i}\) is a point in the point cloud acquired by the camera and \({\mathbf {b}_i}\) a point expressed in RBCS.

The standard deviations \(\sigma\) are computed for each of the components of the transformation matrix and each of the repeated calibrations. The values for one of these calibrations are about 0.3 mm to 0.4 mm for the translations and 0.6 mrad to 1.5 mrad for the rotations, with the Y-axis value (vertical component of the camera’s coordinate system) being the most accurate one. The same \(\sigma\) values were obtained for all our calibrations. This high repeatability of the accuracy is expected, since the same set of robot poses was used for the different data acquisitions and the robot setup remained stable. The values are sufficiently low for the proposed application in this paper, especially acknowledging the accuracy of the robotic arm used in the setup, as well as the depth camera itself.

To evaluate the stability of the transformation parameters \(\mathbf {M}_\mathrm{TCP}^{\mathrm{ICS}_j}\), we repeated the calibration procedure several times over a day, and then again 3 days later. The rotation differences \(\varDelta \mathbf {v}\) and translation differences \(\varDelta \mathbf {t}\) are small numeric variations between the repeated calibrations and they are all far smaller than \(3\sigma\), i.e., \(\pm 4.5\) mrad for rotations and \(\pm 1.2\) mm for translations, and thus statistically insignificant. Based on the outcomes of this experiment, the calibration was sufficiently accurate and stable within the calibration uncertainty, however, we recommend carrying out the calibration once per week. This supports quality control and does not cause a significant delay of the construction process.

To empirically evaluate the quality of the implemented HE calibration a scene with three spheres, two with a diameter of 15 cm and one of 12 cm (see Fig. 5a), was scanned using a depth camera from six different robot poses at distances of about 60 cm. The data sets were directly georeferenced using the recorded robot poses and manually cleaned from mixed-pixels and background. The resulting point clouds are shown in Fig. 5b. The orthogonal distances of the measured points from best-fit spheres (diameter fixed to the known true value) have an root mean square error (RMSE) of 3.7 mm. This value represents the combined effect of measurement noise, pose uncertainties and calibration uncertainties and indicates the accuracy of surface point coordinates determined using the present system. The mm-level accuracy corresponds to the expectation based on the specifications of the depth camera and the evaluation of the HE calibration, and it is sufficient for the purpose of the application presented herein.

Fig. 5
figure 5

Image of the scene with three spheres that were scanned from different camera poses (left) and the corresponding georeferenced point clouds (right), with different colors indicating different underlying poses (right)

The calibration is an automatic process without the need for any user interaction, except placing the target in the scene. Including robot movements to defined poses, depth camera data acquisition, and data processing, it takes approximately 5 min to complete, out of which only about 30 s are needed for computing the HE parameters.

3 Mixed pixel effect

The most relevant specifications of the depth camera used in this paper, i.e., Helios Lucid (Lucid Vision Labs 2021) for the presented investigation are given in Table 1. Tips on its use, e.g., on suggested warm-up time to achieve the highest specified accuracy are given in Frangez et al. (2020). Practical limitations of the depth cameras are related to the changes within the operating environment (i.e., mostly temperature), mixed pixels, and multipath effects. The latter occur when light bouncing off several surfaces arrives at the same pixel of the camera. The magnitude of the effect depends on the surface conditions and the geometry of the scene (Horaud et al. 2016). Most of these errors are shown in Carfagni et al. (2017) and can be mitigated by either calibration or careful setup design, when allowed by the application. The focus of the following part of this section is on the mixed pixels, a measurement bias briefly explained in Sect. 1, and on how they affect rebar edges in point clouds.

Table 1 Selected specifications of the Helios Lucid ToF depth camera (Lucid Vision Labs 2021); accuracy defined as measurement difference to the ground truth value, and precision as measurement repeatability

The camera poses, in particular the viewing angles and the distance from the objects to be measured, need to be carefully planned in order to assure that there are enough non-mixed pixels. This means taking into account the dimensions of the structures, i.e., rebar diameters, and the footprint, i.e., the area in the scene that corresponds to a single pixel. Figure 6 shows the footprint size as a function of distance, and the corresponding number of pixels per 4 mm (thickness of the rebars in our case) for the camera used herein. The criteria for this selection are threefold: (i) the mesh holes need to be small enough, so the material does not fly through, (ii) the rebars need to be thick enough to provide enough area for the material to stick to the structure, and (iii) the mesh needs to be stiff enough and provide sufficient structural support, without significant deformation when material is applied to it. In general, the thicker the rebar mesh, the better it can be scanned from a given distance.

There is no configuration between the camera and a rebar mesh which would yield an image, where each pixel corresponds either fully to an area on the rebars or to an area on the background. Most pixels along the edges of the rebar will be mixed pixels representing both rebar and background. The depth associated with these pixels is subject to mixed pixel errors (Wang et al. 2017; Chaudhry et al. 2021) and represents neither the distance of the rebar nor of the background. The data processing needs to take this into account by appropriate filtering. Naturally, the goal is to have footprints much smaller than the rebar cross section such that there is a sufficient number of non-mixed pixels representing the rebar in the point cloud. We conclude from Fig. 6 that distances below about 1 m are needed for scanning the rebar meshes in our application case.

Fig. 6
figure 6

Theoretical relation between the footprint size and the scanning distance (red curve) and the number of pixels falling on a single rebar (blue curves). The computation is done for a rebar with a diameter of 4 mm

We investigated the mixed pixel effects of scans of rebar meshes, in particular the impact of the background distance to the mesh, based on preliminary observations that mixed pixels do not appear if the background is far away from the rebar. The intensity of the received signal is inversely proportional to the distance squared. This means that the return intensity of the part of the footprint at the background is much smaller than of the part corresponding to the rebar if the background is sufficiently far away. Since the signal intensities also depend on the surface reflectivity and the angle of incidence, the minimum distance of the background for negligible mixed pixel effects also depends on the material and surface finish of the rebars and background. The situation is most critical if the background is orthogonal to the lines-of-sight from the camera, and if it reflects much stronger than the rebar. Theoretically, keeping the mesh at a distance of 0.4 m from the camera and the background at distances between 0.5 m and 1.3 m behind the mesh, the ratio of the received power from foreground (rebar) and background is between approximately 1.5 and 10 growing quadratically with distance.

In relation to this, we conducted an experiment to empirically determine the background distance beyond which the mixed pixel effect is negligible. We used a strongly reflecting background approximately perpendicular to the viewing direction of the camera and at various distances from the mesh. The depth camera was located at a fixed distance of 0.4 m from the rebar mesh, while the background was varied from approximately 0.4 m to 3 m behind the mesh. An example of a point cloud acquired with a background distance from the mesh of 0.6 m and another one acquired with a background distance of 2 m are shown from the side view as well as zoomed-in from the front in Fig. 7. The grid points were extracted from these point clouds according to the approach explained in the following Sect. 4. Since the scanned mesh was quasi-planar, the accuracy of the extracted points could be evaluated by computing the RMSE of the plane fit. In the case when the background was closer to the mesh (1a, 1b), the extracted grid points are significantly affected by mixed pixel effects, the extracted rebar surface is concave instead of planar, and the RMSE at the rebar surface is about \(30\,\) mm. In the case of the background at a distance of \(2\,\) m the point cloud is much less affected by mixed pixels, and the RMSE of the extracted rebar surface is \(6\,\) mm. Overall, mixed pixels do not affect the rebar extraction critically any more with background distances from the mesh beyond \(1.8\,\) m in our experiments. While this value only holds for the specific camera, rebar, rebar distance, and background used in this experiment, the more general and thus important conclusion is that the setup planning and design for the robotic spraying should allow for sufficiently large background distance.

Fig. 7
figure 7

Two point clouds used for grid point extraction and mixed pixels analysis (see text). Parts of side view (1a, 2a) and front view (1b, 2b) plotted with different scale for scans with rebar at \(0.4\,\hbox {m}\) and background at \(0.6\,\hbox {m}\) (1a, 1b) and \(2\,\hbox {m}\) from the mesh. The two point clouds shown here are cutoff at the approximate distance of \(0.2\,\hbox {m}\)

4 Grid point extraction approach

A suitable background distance during acquisition significantly mitigates the amount of mixed pixels as well as their influence on the point cloud. However, mixed pixels are still present in the data set and the extraction of the rebar from the raw point clouds needs to take this into account. We propose an image-based approach combining established image processing tools applied to the depth images. The first part of the proposed solution is the extraction of grid points from the raw data, i.e., points representing the (nearly orthogonal) intersection of rebar elements. We carry out the grid point extraction using the following five steps: (i) image pre-processing, (ii) horizontal and vertical line area extraction, (iii) intersection of lines to determine grid points, (iv) assigning depth to the grid point from the initial depth image, and (v) image post-processing. Their details are as follows:

  • (i) Image pre-processing: The depth image of the acquired scene is first normalized to a range of [0, 255] (see the scene that was acquired in Fig. 8a), so the image processing operations can be carried out using the OpenCV Python library (Bradski 2000). The image is then upsampled to double its original size using a bilinear interpolation (Fig. 8b). Although upsampling does not improve the information content of the raw data, we found that it improves the results obtained from the subsequent image processing steps, in particular in step (ii). The choice of the interpolation method did not affect the final results strongly, but the bilinear interpolation lead to the lowest RMSE of the extracted grid points as compared to the other methods which we had tried, i.e., nearest-neighbour and cubic interpolation. The approach presented in this section performs best if the grid is aligned with the rows and columns of the image to within \(\pm 10^{\circ }\). To assure this, the orientation is estimated by converting the binary image of the original image to a skeleton shape, i.e., a thin version of the original shape that is equidistant to its original boundaries, using an implemented scikit-image algorithm skeletonize (van der Walt et al. 2014). This input is then used for line search using a Hough transform (Illingworth and Kittler 1988). The deviations of the resulting angles of the lines from \(0^\circ\) and \(90^\circ\) are then used to calculate the rotation angle for approximately aligning the image of the rebar mesh with the rows and columns of the image. Finally, the image is smoothed with a Gaussian filter and morphologically closed using dilation followed by an erosion (Comer and Delp 1999). The goal of closing is to remove small holes and disconnected components in the image while preserving the shape of the grid. The image shown for the processing step depiction in Fig. 8 was acquired at a distance of \(0.45\,\hbox {m}\) and with a background distance of 2 m. 30 frames were temporally averaged into a single image, with each frame acquired with an exposure time of \(250\,\mu \hbox {s}\).

  • (ii) Horizontal and vertical line area extraction: To extract the horizontal line areas, the edges of the horizontal mesh structure of the image have to be detected. This is done by using a Sobel operator that outputs the first order derivative in the horizontal direction. The image is then binarized and the line areas are expanded using a morphological dilation. The image at this step is shown in Fig. 8c. The same procedure is repeated for the vertical lines (see Fig. 8d).

  • (iii) Determination of grid points: Once vertical and horizontal lines are identified, a logical conjunction between the two images is carried out, resulting in pixel blobs indicating grid areas (Fig. 8e). Blob contours are then computed, followed by pixel centroid computation for each of the blobs.

  • (iv) Assigning depth to the grid point: The output pixels at this step indicate the locations of the grid points in the image (Fig. 8f). We can now obtain the set of corresponding grid points in 3D space by extracting depth values in the original depth image at the identified grid point locations.

  • (v) Image post-processing: Finally, the 3D points are filtered to eliminate grid points which are implausible and thus likely erroneous. This can be done using prior knowledge based on the assumption that the actual rebar mesh corresponds to the planed one within certain tolerance limits. In our case this is particularly easy, because our experiments involve a regular rebar mesh with fixed grid size. Therefore, we carry out the plausibility control for each grid point by searching for its four nearest neighbours and comparing their respective distance to the planned rebar mesh grid size known from the design. Points whose distances agree with the expected ones to within \(3\,\hbox {mm}\) are kept, the rest are eliminated from the grid point set. The extracted grid points correspond to the front surface of the rebar mesh. If, e.g., a grid point representing the center of the rebar or the center of the weld between the longitudinal and transverse rebar is needed, an offset would have to be added at this step. The direction of the offset can be calculated from the nearby grid points, and the magnitude of the offset can be taken from the design of the mesh, if needed.

Fig. 8
figure 8

Example for rebar grid point extraction for on a single depth data set: a an image of the mesh used in the experiment, b depth image, c extracted horizontal and d vertical areas, e conjunctions, f and depth image with indicated intersections using black dots as a result of the extraction algorithm

The presented algorithm herein is robust to be used for scanning of wide variety of conventional construction rebar meshes, which form a grid-like structure and have quasi-perpendicular intersections between the transverse and longitudinal rebars. The thicker the rebars, the better for the scanning and corresponding data processing algorithm. However, only up to a certain limit, if they are too thick, the morphological operators may not work any more, in particular near the borders of the image. The algorithm might also be suitable for grid point extraction of robotically fabricated structures, such as, e.g., mesh mould (Dörfler et al. 2019); however, this would require further investigation and possible adaptation of the algorithm. Here, instead of long transverse rebars, shorter rebar pieces are used to connect the longitudinal rebars. The investigation of the grid point extraction of the meshes which are significantly non-planar or when several of them are overlapping is left for future work.

5 Grid point accuracy assessment

Different acquisition distances, viewing angles, as well as rebar diameters, influence the final result of the extracted grid point differently. It is assumed, the 3D accuracy of the extracted grid points of the scanned rebar meshed can be on the level of the specified depth camera accuracy (Table 1). To assess this, we conducted an experimental analysis and acquired a set of point clouds at various distances and angles of the camera with respect to the rebar mesh structure. Figure 9 (left) shows the configurations at which acquisitions were done with distances from 0.4 m to 1.4 m and angles from \(0^\circ\) to \(50^\circ\). A laser tracker (LT) Leica AT960, a Leica LAS handheld triangulation scanner, and a Leica T-probe were used for acquiring the ground truth data with an accuracy on the level of \(60\,\upmu m\).

To evaluate the accuracy of the depth camera, the position and orientation of its ICS had to be determined. For this, a temporary LT coordinate system (LTCS) was established in the lab using three spherical mounted reflectors. The external camera coordinate system (ECS) is realized with the T-probe measurements (Fig. 9, right) using three front-face screws. Then, to establish a relation between ICS and ECS, at least six pairs of data sets had to be acquired to solve for six unknowns of the transformation matrix (i.e., three rotations and three translations), each one at a different camera pose with respect to the target. A single data pair corresponds to an intensity image of a checkerboard target acquired with a depth camera with data expressed in ICS, and the ESC acquired with a T-probe. This data set is then used to compute the transformation between ICS and ECS.

We do this again according to the Tsai HE calibration method (Tsai and Lenz 1989), which is explained in Sect. 2. Here, instead of the RBCS we use an arbitrary coordinate system LTCS established with the LT, and instead of the TCP we use the ECS, which is established with the T-probe. The standard deviations of the estimated translations and orientations of the transformation matrix \(\mathbf {M}_\mathrm{ICS}^\mathrm{ECS}\), are on the level of 0.3 mm and 0.7 mrad, respectively.

Fig. 9
figure 9

Data acquisition configurations for accuracy assessment of grid point extraction at various distances and angles of the camera with respect to the rebar mesh (left). Sketch showing ICS and ECS on the camera front face, with the latter one realized with the T-probe measurements on the three front-face screws (right)

In the next step, all the rebar mesh data sets acquired with the camera and those done with the T-scan, are transformed into a common coordinate system. This is achieved based on the known ECS for each of the camera positions, measured with the T-probe. A point \(\mathbf {s}_i\) measured with the T-scan, can be expressed in ICS as \(\mathbf {q}_i\) with

$$\begin{aligned} \mathbf {q}_i = \mathbf {R}_\mathrm{ICS}^\mathrm{LTCS} \cdot \mathbf {s}_i + \mathbf {t}_\mathrm{ICS}^\mathrm{LTCS}, \end{aligned}$$
(3)

where \(\mathbf {R}\) denotes the rotation matrix and \(\mathbf {t}\) the translation vector, describing the transformation from the ICS to the LTCS.

Once the data is expressed in a common coordinate system the comparison can be done between corresponding sets of points acquired with the depth camera and the T-scan. Before comparison, the grid points within the depth camera data set are extracted with the algorithm presented in Sect. 4 to detect grid points. The ground truth data set acquired with the T-scan is used for manually selecting the grid areas, which are then used to compute the 3D centroids, corresponding to the grid point locations. Then, coordinate differences are compared for each of the three directions, which are then used to compute empirical standard deviations of differences \(\sigma\), shown in Fig. 10 for a rebar mesh of diameter 4 mm. For all three directions (see axes directions in Fig. 9), \(\sigma\) values are on the level of 5 mm for up to the distance of \(1\,\) m. Throughout this analysis, we assume the bias is \(\approx 0\). The \(\sigma\) values increase significantly if the distance grows beyond \(1\,\)m. The \(\sigma _Z\) values are larger compared to the \(\sigma _X\) and \(\sigma _Y\), because \(\sigma _Z\) are directly affected by the accuracy of the ToF measurement principle, while the latter two largely depend on the resolution of the image sensor. There is an impact of the angle on the \(\sigma\) values; however, it appears it is not significant for angles below \(40^\circ\). The influence of the angle appears to be larger on \(\sigma _X\) and \(\sigma _Y\), likely because of the way the rebar mesh structure is built. The longitudinal rebars are positioned in the front and the transverse in the back of the mesh. The rotation of the mesh about the Y-axis and the angle under which the mesh is measured with the camera results in a observation corresponding to the side of the rebar rather than from the front, therefore, causing a shift in the X- while not influencing the Y-direction.

In general, combining the accuracy and the precision of the depth camera (Table 1), the uncertainty of the LT system, and the way measurements were conducted, the results show the expected accuracy. A second data set was acquired on a different day using the same camera for the same measurement configurations. The \(\sigma\) values of the second data set confirmed the results presented herein. Specifically, the empirical standard deviation \(\sigma _{3D}\) of the values between the two data sets is on the level of 0.8 mm, which is within the range of the specified repeatability of the depth camera.

Fig. 10
figure 10

Empirical standard deviations of differences \(\sigma _X\), \(\sigma _Y\), and \(\sigma _Z\) (\(1\cdot \sigma\)) of the grid points measured with a depth camera with respect to the LT data set, chosen as ground truth. Different colors of the curves indicate different mesh rotation with respect to the camera, as indicated with the colorbar

An important indicator of the success of the grid point detection is as well the number of grid points that get detected at certain distances and angles. In the data sets up to the distance of 1 m all of the grid points were detected, i.e., 156 points for the views far enough to see the whole mesh grid. For acquisition at the distance 1.4 m the number decreased to approximately 70 points. Data sets beyond the distance of 1.4 m were acquired as well; however, the number of pixels on the mesh structure was sparse (i.e., below 30) and did not allow for deriving reasonable conclusions; therefore, this part of the analysis was not included.

6 Mesh detection and digital reconstruction process

The RBCS is chosen as the reference CS within our robotic fabrication setup, meaning all of the acquired data with the depth cameras, process planning, and trajectory generation are expressed in that system. A sequence of steps was developed to most efficiently reconstruct rebar structures from several points of view, wherever in the work-space of the robot they are situated. The process is carried out in the following sequence: (i) coarse scanning, (ii) coarse reconstruction, (iii) refined scanning, (iv) refined reconstruction, and (v) spray path trajectory generation. A detailed description of each of the steps is as follows:

  • (i) Coarse scanning: The work-space area in front of the robot is scanned from a fixed number poses at different angles (Fig. 11a). For our particular setup the acquisition is done from 16 defined poses. The robot poses are chosen such that they do not reach too far into the work-space, therefore, avoiding possible collision with the structure itself. The mesh is always positioned in the centre of the fabrication platform, i.e., within the optimal working range of the robot; therefore, this results in the coarse scanning positions within the range from \(\sim\)0.8 m to \(\sim\)1.6 m. Some of the grid points not may not be detected, but this will be corrected in step (iii) with refined scanning. The output of this step is a set of depth images with their corresponding camera poses.

  • (ii) Coarse reconstruction: Each of the images is processed according to the grid point extraction algorithm from Sec. 4. The extracted sets of 3D coordinates of the grid points are then transformed into the RBCS using the camera pose information. Because of the acquisition distances, the detected grid points at this step do not have the highest achievable accuracy; therefore, the outcome of this process is a rough digital surface, because of these non-optimal scanning configuration (Fig. 11b). This digital surface is generated based on Open3D library functions (Zhou et al. 2018), using Poisson surface reconstruction producing a triangle mesh based on the extracted grid points.

  • (iii) Refined scanning: The digital surface from the previous step is used as an input to the second step of scanning, where a new set of poses is automatically generated (Fig. 11c). Here, each camera view covers a part of the rebar mesh at an acquisition distance of approximately 40 cm and with an overlap between views of 30%, with both parameters chosen empirically. The output of this step is a set of depth images with the corresponding robot poses.

  • (iv) Refined reconstruction: The grid points are extracted from the acquired images and then transformed to the RBCS (Fig. 11d). Some of the grid points will be detected several times because of the overlaps between different views. A geometric median, i.e., a point in Euclidean space minimizing the sum of distances to the input set of points, is computed for all the extracted grid points within a neighbourhood of 5 mm, a value determined empirically, thus resulting to single extracted grid point for each mesh grid intersection. Median computation is chosen, because it is a more robust statistic to outliers than, e.g., mean. In addition, all of the points which do not correspond to any actual grid point have to be filtered out at this step. The filtering is done in a way that for each point, a condition of having a least two neighbours within a certain range has to be passed, otherwise the point is excluded from the data set. In the next step, an accurate digital surface is reconstructed out of the grid points (Fig. 11e) using Poisson surface reconstruction.

  • (v) Spray path trajectory generation: The digital surface of the rebar structure is an input to generate the spray path trajectory. Poses are generated at a distance of approximately 20 cm from the structure, following the local normal orientation. This distance is much larger than the uncertainty of the preceding surface detection (\(\sigma _{3D}\approx 5\,{{\mathrm{mm}}}\)) and thus collisions between the robot and the rebar mesh during the spraying process are avoided.

Fig. 11
figure 11

Sequence of steps for the on-site rebar mesh scanning and reconstruction process. a Coarse scanning with fixed set of robot poses. b Coarse reconstruction. c Refined scanning using uniquely generated poses based on the rough surface input.d Grid points of the stitched individual data sets based on the recorded robot poses. e Fine reconstruction. f Robot spray trajectory generation based on the fine surface input

The digitally reconstructed surface of the rebar mesh can then be used to estimate the area, which is a relevant parameter in the following process planning. Based on this and the anticipated target structure thickness, the approximate amount of material (i.e., concrete, fibers), which is required for the construction can be computed and prepared. In a controlled production facility, this would allow for a very targeted material use with reduced amount of waste at the end of the production. In this way the process can be designed more sustainably, using an additional benefit of the advanced sensing technologies.

The whole process presented in this section was applied in an experimental setup representing a realistic application example. A mesh of approximately \(2.5~\hbox {m}^2\) was scanned using one Helios Lucid depth camera mounted on the ABB IRB 60/2.05 robot. Except the preceding calibration (see Sect. 2), the whole process took about 10 min, with the computations requiring approximately 2 min, using a computer with 15.8 GB RAM and Intel Core i5-8279U CPU with 2.4 GHz. A robot pose during the scanning of the mesh is shown in Fig. 12.

Fig. 12
figure 12

Image showing one of the poses during the rebar mesh scanning procedure. Here only two of the four depth cameras were in use

7 Conclusion and outlook

In this paper we demonstrated the use of depth cameras for rebar mesh scanning and digital reconstruction for robotic spraying applications. The achievable 3D accuracy, i.e., the difference of the camera measurements to the ground truth data of the extracted grid points at distances closer than 1 m was on the level of 5 mm or better. Furthermore, we proposed an innovative step sequence for detecting and reconstruction of the unknown geometry of the rebar mesh structure. It is an autonomous process demonstrated on a sample rebar mesh of 2.5 \(\hbox {m}^2\), which took about 10 min using one depth camera. The data collection and processing time will likely increase quadratically with shorter distance as compared to the application example herein, because the mesh area covered by the camera with one image decreases quadratically and consequently more images are needed. To be able to directly georeference the data sets acquired at various robot poses, we made use of the state-of-the-art hand–eye calibration procedure, where calibration parameters can be estimated based on the acquired intensity images and corresponding robot poses. In particular, the outputs of the presented algorithm can be used for estimation of the approximate amount of material needed for construction. This allows for reduction of material waste and increase of the process sustainability. Finally, the inclusion of sensing solutions in robotic setups for construction will open up commercial opportunities, e.g., automatic customized mass production of fabricated objects due to sufficient control during fabrication.

The approach of depth image acquisition and processing is primarily designed for a pre-fabrication facility, i.e., a controlled environment, where most importantly neither the camera nor the mesh object are exposed to the direct sunlight. This is to avoid the saturation of the image sensor, which would reduce the quality of the acquired images and, therefore, the extracted mesh grid points. In this respect, the algorithm would also work on-site in an uncontrolled environment, as long as the camera and the mesh object are protected from direct sunlight. Since the camera has its own illumination source, changing light conditions are not a problem. In addition, because the time of data acquisition is short, i.e., only a few minutes, it is unlikely that the camera temperature would change sufficiently to cause significant deviations in the depth images. If these deviations would become an issue, a correction model taking the camera temperature as an input could likely be developed to mitigate the effect.

The presented algorithm herein is robust to be used for scanning of wide variety of conventional construction rebar meshes, if they form a grid-like structure and have quasi-perpendicular intersections between the transverse and longitudinal rebars. Even though the algorithm in this paper was demonstrated only on rebars of 4 mm diameter it is applicable also to thicker or thinner rebars, and similar accuracy as found herein is to be expected. Thicker rebar meshes are more suitable for scanning and corresponding data processing. If the rebar is thinner, shorter acquisition distances are needed for the accurate reconstruction.

The proposed algorithm might also be suitable for grid point extraction from robotically fabricated structures, such as, e.g., mesh mould; however, this would have to be further investigated and could possibly require algorithm adaptation. As a part of the future work we will assess the limits of the presented algorithm in terms of scanning significantly non-planar meshes and we will address how to scan overlapping areas between several rebar mesh matts.