1 Introduction

In the industrial environment, steps are taken to increase energy efficiency. This also includes the insulation of production facilities (pipelines, fittings, vessels, etc.) to prevent heat but also cold losses. While a large part of the components is already insulated, more complex shapes (e.g. valves) or parts that are difficult to reach (e.g. at greater heights or on the ceiling) in particular remain uninsulated. According to the European industrial insulation foundation (2021), uninsulated building components in Europe have an annual savings potential of 40 million tons of CO\(_{2}\), the equivalent of 10 million households. One of the reasons for these problems is the fact that industrial insulator measure pipeline components still by hand, which is not only time-consuming (especially when scaffolding or working platforms are needed to measure components on the ceiling or in greater heights) but also very imprecise.

In contrast to the insulators, the use of digital twins and Building Information Modelling (BIM) is more widespread among many plant owners and plant construction companies. Especially with regard to rebuilding work, maintenance, documentation or even safety inspections, the use of BIM has become indispensable.

Laser scanners are usually used to create or update a BIM. However, this technology also brings some disadvantages. In addition to the high cost of the instrumentation, problems occur on reflective or shiny surfaces (e.g. in food industry plants). Despite the great progress in research, modelling with commercial evaluation software based on the point clouds is still very time-consuming and, depending on your experience, also complex.

To speed up and simplify the work processes (especially in the insulation industry), a pre-calibrated three-camera measurement system (Fig. 1) was, therefore, developed to measure the necessary plant components. Modelling the objects on the basis of a photogrammetrically generated point cloud is not expedient, since the low texture of the pipes and pipe components usually results in a very sparse point cloud (Brief 2021).

Fig. 1
figure 1

Three-camera measurement system (horizontal/vertical base: approx. 650 mm)

In contrast, the object edges are used for the reconstruction, whereby the first step forms the edge detection. Based on the edges, first straight pipes are reconstructed and then further components of the pipeline are calculated, using the 3D information of the already reconstructed pipes (Hart et al. 2022). The individually calculated objects are combined to a pipeline by a topological analysis and the geometry is recalculated and optimised in the context of an adjustment. Provided that it concerns larger construction units, which cannot be captured from a single point of view, a transformation can be applied for the merging of the points of view and the individual reconstructed parts, respectively. The as-built data are transferred to the planning software for the insulation via an interface. There, by specifying the insulation thickness and other parameters, the geometry of the insulation is calculated, which in turn can be transferred to the production machine via another interface. The digital work process increases the accuracy of fit of the insulation and reduces the need for rework. Even for objects that were previously too complex or components that were difficult to access, measurement or subsequent insulation can be made possible in this way.

In order to achieve simple handling of the measurement system including the associated reconstruction software with regard to the target group of insulators, various automation procedures were implemented to facilitate and accelerate manual reconstruction. Techniques from the field of computer vision, photogrammetry as well as existing object knowledge from the standardisation context are used here.

At the beginning, the state of the art is discussed in Sect. 2. A short presentation of the measurement system as well as its calibration can be found in Sect. 3. Section 4 and present calculation and automation approaches for the reconstruction of cylinders and elbows, respectively. Section 6 is devoted to the determination of the type and dimension of flanges (in accordance with the current standards). This is followed by the presentation of an adjustment approach for the optimisation of the component parameters in Sect. 7, whereby existing object knowledge (e.g. for the dimension of the components) can also be integrated for this purpose. Subsequently, in Sect. 8, a transformation method is presented, which is based on the reconstructed objects themselves instead of control points. Section 9 discusses and evaluates the results. The paper ends with a conclusion as well as an outlook on AI-based object recognition in Sect. 10.

2 Related Work

Currently, laser scanners (LIDAR) are mainly used for the reconstruction of industrial plants and especially pipelines. Thus, we give a general overview of imaging and processing techniques, presenting LIDAR-based technologies as well as photogrammetric techniques and combined approaches.

2.1 LIDAR

Fidera et al. (2004) analyses the covered area of scan points on pipes for different materials. As a rule, this is less than 50% of what is actually possible for all materials investigated. Depending on the material, e.g. stainless steel or plastic, larger deviations may occur after fitting a cylinder. In this context, photogrammetry again offers advantages, since a reconstruction based on the edges is possible independent of the material.

Kang et al. (2020) presents a general workflow for scan-to-BIM applications for Mechanical, Electrical and Plumbing. By choosing appropriate parameters, this can be adapted to different scenarios. The authors evaluate the approach for both a LIDAR point cloud and a photogrammetrically generated point cloud from a UAV survey. As a result, this approach is slightly superior to commercial software packages for modelling in terms of recognition rate.

For the automation of modelling based on the point cloud, various strategies have been developed in recent years, which can be divided as follows, according to Maalek et al. (2019).

2.1.1 Scan vs. BIM

Bosché et al. (2013) use an existing as-designed BIM model for the segmentation of pipes or generally installations, whereby they assume an approximately as-designed-conform construction. Bosché et al. (2015) determine cylinder parameters by cross sectioning in the point cloud and the corresponding area of the BIM. In addition to pipes, Maalek et al. (2019) also model flanges in particular using an existing BIM, whereby they also work with a projection into the plane. In contrast to the mentioned authors, Son et al. (2015) detect, supported by pre-known component lists and dimensions, the objects and their position directly from the point cloud geometry. They execute the registration based on the as-designed model after the reconstruction. The other authors, however, perform this step first. All in all, these methods offer a simple automation option, but are only suitable if the corresponding prior knowledge is available. If an as-designed BIM cannot be provided or if it deviates strongly from the construction, these methods fail.

2.1.2 Local Curvature-Base Methods

Kawashima et al. (2014) and Nguyen and Choi (2018) work with a surface normal based region-growing algorithm to segment the point cloud. For fitting the actual cylinder, Nguyen and Choi (2018) use the algorithm of Schnabel et al. (2007). Kawashima et al. (2014), on the other hand, use eigenvalue decomposition to determine the cylinder orientation and the axis as well as the radius by projecting the cylinder points into a plane orthogonal to the cylinder orientation. A similar procedure is also used by Tran et al. (2015). Dimitrov and Golparvar-Fard (2015), on contrast, use region-growing based on curvature and surface roughness. However, they only focus on the segmentation of the point cloud. Chan et al. (2020) investigate the fitting of elbows of different types into a scanned point cloud and the effects of different coverage levels. In general, these methods are mainly suitable for simple geometric shapes, such as the cylinder. However, they are less suitable for more complex components or those that are composed of different shapes (e.g. the flange). Problems can be expected especially in those places where the surface and its normals change suddenly.

2.1.3 Hybrid Methods

Ahmed et al. (2013) or Liu et al. (2013) use prior knowledge and assume that the pipes are either parallel to the ground or the walls, respectively. They extract cylinders from point clouds by projecting them in 2D space. They use cross-sections, so the problem reduces to finding 2D circles. The projection planes are parallel to the walls or the floor and orthogonal to each other. This assumption is certainly justified in some cases, but not generally valid. Thus, this procedure is only applicable for some scenarios.

2.1.4 Machine Learning

Perez-Perez et al. (2021) present a method for semantic segmentation. First, the point cloud is segmented using the algorithm of Dimitrov and Golparvar-Fard (2015). This is followed by geometric and semantic labelling using different classifiers and assigning labels to segments using Markov Random Fields. However, the authors limit their approach to segmentation only. For a modelling of the pipelines, a subdivision into individual components as well as a geometric fitting for the extraction of the associated parameters is also necessary.

2.2 Photogrammetry

In general, the reconstruction of cylindrical objects is already possible with a single image. Doignon and de Mathelin (2007) present, if the radius is known, a closed form solution. Since in our case the diameter of the pipes is not known in advance, at least two images are needed. Calculation approaches with two or more images are provided by Navab and Appel (2006) using Plücker coordinates. Becke and Schlegl (2015) further present a least-square adjustment for image-based reconstruction of cylinders. Using multiple cylinders, the orientation of cameras can also be solved (Navab and Appel 2006).

Veldhuis and Vosselman (1998) present an image-based computational approach for cylinders, which is an extension of the line feature introduced by Mulawa (1989). A least-squares fitting for pipes can be found in Ermes et al. (1998). The basis for the reconstruction are parameterised CAD models of the pipeline components. Tangelder et al. (2003) describe an extension of this system. They analyse the pipeline as a whole and integrate the neighbourhood relations between the components into the calculation. Also Bürger (1999) uses photogrammetry and a least-squares fitting for the reconstruction of pipelines. In contrast to Ermes et al. (1998), Bürger (1999) minimises the deviations not in the image but in the object space. The photogrammetric approaches discussed provide useful results, especially with respect to accuracy. However, the reconstruction is not automated. The edges of the objects must be selected individually in the images before the calculation.

An automated approach for extracting pipes from underwater images is described by Rekik et al. (2018), which use an image descriptor and Support Vector Machine to perform a classification regarding the presence of pipes as well as a localisation using bounding boxes. The result of the approach is the object recognition. A further processing and the reconstruction of the pipes is missing.

Photogrammetric reconstruction of pipelines with a consumer camera is explained by Ahmed et al. (2012). The authors calibrate the camera in advance. Control points are also placed to solve the orientation. Reconstruction is based on the generated point cloud, although direct reconstruction based on the images is also mentioned as an alternative. Similarly, Martin-Abadal et al. (2022) use photogrammetric point clouds, which are calculated from the images of a stereo camera system. Using Deep Learning and a modified version of the PointNet model, the points can be classified (pipes, fittings or background) and then extracted together with the required information (position, orientation, length, diameter, etc.). We have also already investigated the photogrammetric generation of a point cloud from images taken in industrial plants (Brief 2021). Depending on the surface texture, this sometimes resulted in very sparse point clouds and problems with the reconstruction.

A comparison of terrestrial laser scanning and photogrammetric reconstruction of pipelines can be found in Ahmed et al. (2011). While the acquisition time is similar for both methods, the authors see an increased time requirement for photogrammetric data processing. For the latter, the operator needs more prior knowledge, while on the other hand higher costs for the instrumentation should be calculated for laser scanning. Here, we see potential for our approach. By automating the photogrammetric reconstruction, we want to make it faster and simpler, whereby the advantages of photogrammetry then prevail.

Early approaches to an inspection system for underwater pipelines describe, for example, Tascini et al. (1996) or Narimani et al. (2009). Both use a camera-based system and localise pipes based on the edges in the image. The latter work with the Hough Transform (Hough 1962) for this purpose. This approach may be useful underwater because, apart from pipes, there are generally few linear structures. In industrial environments in particular, there are significantly more linear objects. A reconstruction based purely on the Hough Transform, therefore, leads to many errors (cf. Sec. 4.1).

Calculation methods for the measurement and reconstruction of complex, curved tubes, e.g. on machines or motors, describe Guo et al. (2021) or Cheng et al. (2021). Guo et al. (2021) use images from a single camera for this purpose. To extract image edges for the reconstruction, the authors use a convolution neural network, which gives better results compared to classical edge operators. The pipeline geometry is obtained by intersecting the edges with submillimeter accuracy. Cheng et al. (2021) also omit edge operators like Canny and use a modified fully convolutional network based on U-Net. The authors use HDR images as input data and use their own loss function for the training in order to consider redundant or missing edges in the return data of the network. At first glance, the field of application thus seems similar. Nevertheless, there are differences. In mechanical engineering, it is usually a matter of individually shaped tubes, which are described by a polyline and the diameter. Fittings (e.g. flanges or valves) do not usually occur there.

2.3 Hybrid Techniques (LIDAR and Photogrammetry)

Rabbani (2006) primarily uses laser scanner images for pipeline reconstruction, where hidden objects can be reconstructed and added with the help of images. For automatic cylinder extraction, he uses a modified two-stage Hough Transform. Single images can be oriented using reconstructed objects, and additional objects can be computed about these. The author also describes an object-based transformation for registration.

The aerial survey using a drone is described by Guerra et al. (2018). Due to the disadvantages of photogrammetry in terms of automatic reconstruction and point cloud-based computation in terms of computation time, the authors use the drone in combination with a multisensor system. In the point cloud, cylinders are first detected using RANSAC. In the further process, the cylinders or pipes can be detected and fitted in the image based on the projection. This is possible over the rest of the flight. A further development towards automation is described by Guerra et al. (2020) by training a neural network to detect the pipes in the images. Also Cheng et al. (2020) use a neural network for automatic extraction of the components. After the classification and clustering of the measurement points, a graph-based analysis of the pipe axes follows. At the end, the components are fitted into the respective axis sections. A similar approach is used by Wang et al. (2022). In addition to the point cloud, images are also captured. With the help of a neural network, the images are semantically segmented. Subsequently, depth maps with semantic information can be generated from the images. The segmentation can then also be transferred to the LIDAR point clouds and the components extracted on this basis.

We currently do not see the combined use of photogrammetry and LIDAR as an advantage. On the contrary, since this approach entails additional costs due to the double set of equipment. In addition, more data are generated and thus a greater effort for the processing (e.g. registration and orientation) is required. Nevertheless, object recognition using Deep Learning is also a promising approach. However, the training process is complex due to the large amount of data.

3 Proposed Measurement System

The proposed measurement system was designed especially for use in the insulation industry, whereby, in accordance with the insulators, a measuring accuracy of 2–4 mm is required. In this sense, on the one hand, the comparatively cost effective camera measurement technology was selected. On the other hand, the system can also be used by amateurs due to a pre-calibration. However, the techniques presented in the following are also generally transferable for a photogrammetric evaluation, e.g. for hand-held images. In this context, a drone-based survey is planned for the future.

The measurement system (Fig. 1) consists of three industrial cameras in combination with three ring lights, which are permanently mounted in a triangular arrangement on a holder. The cameras are calibrated using Pictran software (Technet GmbH 2023) within a common bundle block adjustment. A 3D calibration board with Inpho targets (Trimble 2023; Ahn and Schultes 1997) and known coordinates is used for this purpose, including vertical images. The RMS of the corrections (for the observed control point coordinates) after adjustment is approximately 0.2 \(\upmu\)m or 0.05Px. As a result, six distortion parameters (radial distortion, decentration, scale and shear), the principal point and the camera constant are available for each camera. Based on the determined calibration parameters, the error influences caused by the camera and the lens can be removed computationally and a rectified image without lens distortions can be generated in the developed software. The parameters of the calibration or orientation are assumed to be stable during the object acquisition. Provided that the upper camera has been removed for transport, the relative orientation can be determined in the field with at least 5 homologous points (Luhmann 2018). The distance between the horizontal two fixed cameras is used as scale information.

The system is designed for an acquisition distance of 3–10 m, where the field of view of the cameras at 10 m is about 3 \(\times\) 4 m. The system remains on the tripod during the measurement, while a swivelling bearing also allows the measurement of objects on the ceiling. We have illustrated the workflow from the recording followed by the reconstruction and finally the design and manufacturing of the insulation in Fig. 2.

Fig. 2
figure 2

Workflow for as-built reconstruction and manufacturing of insulation

4 Reconstruction and Detection of Cylinders

Pipe objects can be calculated directly on the basis of the images and the object edges contained using the approach presented by, for example, Bösemann (1996) or Bürger and Busch (2000). This is possible due to the radial symmetry of the pipe geometries, so that the entire object can be reconstructed on the basis of the object edges in the images.

A cylinder is represented in the image by two straight lines corresponding to the edges of the cylinder surface (Fig. 3). For the reconstruction, two planes \({t}_{i,1}\) and \({t}_{i,2}\) can be generated per image i on the basis of the two edges of the cylinder surface, which run through the projection centre of the camera. The cylinder thus results as a tangential element to the four planes (\({t}_{1,1}\), \({t}_{1,2}\), \({t}_{2,1}\) and \({t}_{2,2}\)). Strictly speaking, three planes are also sufficient for the reconstruction (Dingle 1998). However, since in this case there is an ambiguity (four solutions), in the images from which only one plane results, the side on which the cylinder lies with respect to the edge must be defined.

As a prerequisite for calculation, there must be at least one camera pair whose base is not in a common plane with the cylinder axis and the projection centre. This condition is fulfilled by the three-camera measurement system. Thus, first the planes \({e}_{1}\) and \({e}_{2}\) can be computed, which contain both the 3D axis of the cylinder and the projection of this axis in the image (Fig. 3). According to Navab (2002), the normal vector \({n}_{e,i}\) of the plane \({e}_{i}\) is computed from the normal vectors \({n}_{t,1}\) and \({n}_{t,2}\) of the tangent planes \({t}_{i,1}\) and \({t}_{i,2}\), where these point away from the cylinder by definition:

$$\begin{aligned} {n}_{e,1} = \frac{{n}_{t,1}+{n}_{t,2}}{2} \end{aligned}$$
(1)

The corresponding projection centre \({P}_{0,i}\) can be used as a point of the planes \({e}_{i}\). The cylinder axis is then obtained by intersecting the two planes \({e}_{1}\) and \({e}_{2}\).

Fig. 3
figure 3

Reconstruction of a Cylinder (Top View)

Thus, for the reconstruction of cylinders in conjunction with the presented measurement system and the associated processing software, as well as in the preceding approaches of Bürger (1999), Dingle (1998), Hilgers et al. (1998), Mischke and Rieks (2001), Navab (2002) and Tangelder et al. (2000), a manual selection of the image edges (for spanning the tangential planes \({t}_{i,j}\)) is necessary. A fully automatic reconstruction of cylinders and elbows based on image data is described by Bösemann (1996) or Jin et al. (2016). However, these techniques are used in special measurement systems with active background illumination. Occlusions by other objects or disturbing light effects are thus not present, so that the possibility for automation is easily given there. These conditions are usually not met for images in industrial plants, which makes automation more difficult.

4.1 Fully Automated Brute Force Approach

Nevertheless, a fully automatic reconstruction based on extracted lines was first investigated as part of this study. For the detection of lines in the image with classical operators like the Hough Transform (Hough 1962) or the Progressive Probabilistic Hough Transform (Matas et al. 2000) the image edges are needed. However, they are also required in general for the further calculation steps. Therefore, the image edges are calculated previously with subpixel accuracy using the algorithm presented by Trujillo-Pino et al. (2013). Overall, it has been shown that the lines extracted using the Hough Transform lead to many false detections, especially in the case of a noisy data basis, respectively, require a careful choice of parameters. Therefore, the LineSegmentDetector (Gioi et al. 2012) was used instead, which requires no parameters and yields fewer false detections. Next, all line pairs with approximately the same direction were detected per image. These line pairs represent the edges of a potential cylinder. In a brute-force approach, the line pairs are now matched across images and intersected in object space.

The following filters are applied to eliminate incorrect combinations:

  • Average depth of the cylinder: the reconstructed cylinder lies within the working range of the measurement system

  • Diameter of the cylinder: the diameter of the reconstructed cylinder is close to a normalised value from DIN (2003)

  • Intersection quality: the planes generated from the edges in the image are tangent to the cylinder with as little deviation as possible; the skew distance between the cylinder axis and the image rays based on the edges corresponds to the radius

Fig. 4
figure 4

Detected (mostly horizontal) cylinders on the pipeline dummy using the fully automatic brute-force algorithm

Figure 4 shows the fully automatically detected cylinders with a majority horizontal course. All cylinders match the above filter criteria. Despite some correct pipes, many false detections are also included. In addition, due to the large number of possible line combinations, the calculation time is very long. In the present case, this amounted to approx. 15 min with a single-thread implementation. 15 min more must be added for the calculation of the majority of vertical cylinders. Compared to images from real industrial plants, Fig. 4 tends to contain fewer line-shaped edges, so that with even more lines the quartic increase in computing time becomes enormously noticeable. Roughly, the number of possible combinations \({n}_{k}\) can be calculated as follows:

$$\begin{aligned} {n}_{k} = {n}_{L1} * \left( {n}_{L1}-1\right) * {n}_{L2} * \left( {n}_{L2}-1\right) \end{aligned}$$
(2)

Assuming approximately the same number of lines \({n}_{L1}\) and \({n}_{L2}\) in the first and second images, respectively, the equation simplifies to

$$\begin{aligned} {n}_{k} \approx {n}_{L1}^4 \end{aligned}$$
(3)

In practice, however, \({n}_{k}\) turns out to be smaller, since some combinations can be discarded due to the pre-filtering based on the line direction.

In sum, however, it can be stated that this approach is not very practical due to the high computation time and the poor results.

4.2 Semi-automatic Approach

In addition to full automation, however, a semi-automatic solution is also conceivable, in which the user manually selects a pair of lines in an image, i.e. the two edges of the cylinder. If one continues to pursue the approach presented in Sect. 4.1, only \({n}_{L2}^2\) possible combinations result from the preselection, whereby a performant computation is also not possible in this way.

However, the preselection in one image opens the possibility for another procedure: In analogy to the epipolar line, a search range for the cylinder can be defined in the second image. If the depth information of the cylinder is also available, the search range can be narrowed down even further. However, the depth information of the cylinder is not available when using one image. It is well known, however, that there is a dependency between depth and dimension. The cylinder can be very close to the image plane but of small dimension or far away and of large dimension. Therefore, if the dimension of the cylinder is known, the depth can consequently be derived. In this context, the DIN (2003), which specifies concrete values for the pipe diameter is helpful, i.e. for each diameter also a depth and thus also a concrete 3D cylinder can be calculated. From the set of 3D cylinders, only the one with the correct diameter must finally be selected, whereby this is now easily possible via the comparison with the second image.

Under certain circumstances, before starting the calculation, it is useful to restrict the value range of the cylinder diameters first. While the DIN (2003) specifies diameters up to 2,5 m, these are rather rarely needed in full range. After the user selects the edges in one image, the possible cylinders are calculated according to the selected diameter range. The cylinder axis is obtained by intersecting the view planes t offset in parallel by the radius r.

Fig. 5
figure 5

Calculated cylinders based on image edges and concrete diameters

The possible cylinders (Fig. 5) can be further narrowed down based on the working range of the measurement system (3–10  m) according to Fig. 6.

Fig. 6
figure 6

Limitation of the possible diameters based on the working range

Fig. 7
figure 7

Projections of the possible cylinders in the upper image of the measurement system, red: correct cylinder

Fig. 8
figure 8

Projections of the possible cylinders in the right image of the measurement system, red: correct cylinder

In Figs. 7 and 8, starting from a manual selection of the cylinder edges in the left image of the measurement system, the other two images are shown with the projections of the possible cylinders (after filtering based on the working range). The correct cylinder or cylinder diameter (in red) can be easily identified visually. To make this decision by the computer, the distances or the degree of correspondence between the projection of the cylinder and the object edges in the image are examined. For this purpose, the distances to the object edges in the image are determined in an arbitrary interval s along the backprojected edges (Fig. 9). The search is performed in orthogonal direction to the projection and, with respect to computation time, in a defined range b. If no edge pixel is found in the given search range b, the value of b or \(\frac{b}{2}\) is used as the value for the distance \({d}_{i}\). To generate a quality criterion \({q}_{c}\) for the respective cylinder c, the RMS of the n distances \({d}_{i}\) is calculated:

$$\begin{aligned} {q}_{c} = \sqrt{\frac{\sum \nolimits _{i=0}^{n} {d}_{i}^2}{n}} \end{aligned}$$
(4)
Fig. 9
figure 9

Image section with edge pixels (red) of a pipe, back projection of the approximate pipe (green) and the determined deviations \({d}_{i}\) (blue) in the interval s and in the search range b around the back projection. In the upper left region, no edge pixels and thus distances \({d}_{i}\) could be found due to a too small search range

Starting with the cylinder c with the smallest value \({q}_{c}\), a ranking can then be made for the most likely cylinder.

Figures 7 and 8 show that the red cylinder corresponds best with the image data, but there are still deviations. The reason for this is the generally poor intersection geometry of the parallel offset tangential planes to the cylinder. In particular, for distant cylinders with small diameters, a very acute intersection angle results, so that the 3D geometry of the resulting cylinder can only be understood as an approximate value. For a final calculation, therefore, the image information from the other two images should also be used.

For this purpose, on the basis of the projection of the approximate cylinder, the bounding box is calculated in the other two images and enlarged by a buffer in order to compensate for the uncertainties of the position of the projection. The lines contained in the resulting image section are then detected. Here, the Progressive Probabilistic Hough Transform is applied. Due to the small image section, which is roughly localised in the area of the pipe, relatively few false detections usually appear. To find the correct line pair and eliminate false detections, the 3D section geometry is examined for all line combinations in the image section. The diameter of the calculated cylinder can also be used as a filter. If there are large deviations from the nearest standard diameter, the cylinder and the associated line combination can be discarded. If the correct edges were found in the images, the cylinder geometry can be adapted from this. For a final calculation, it is recommended to use the adjustment approach presented in Sect. 7.

Particularly, in the case of pipes with a small diameter and at a large distance, but also in the case of uncertainties in the manual selection of the edges in the initial image, there are deviations to the parameters of the initial cylinder, which in turn influences the search range for the other edges in the remaining two images. If larger deviations occur during the calculation of the initial cylinder, the entire calculation procedure can fail under certain circumstances.

For predominantly vertical cylinders, however, a step-by-step calculation can help. If the edges were selected manually in the left (or right) image, the initial cylinder should first be optimised using the image from the upper camera and then the third image should be evaluated. Figures 7 and 8 prove this. In Fig. 7, the deviations between the red cylinder projection around the actual pipe in the image are much smaller than in Fig. 8. A nearly vertical cylinder was reconstructed, and in this case the parallax in the x-direction is crucial for the reconstruction (Hart et al. 2022). Since between the upper and the left camera the x-parallax is only half as large compared to the right and the left, the deviations in Fig. 7 are much smaller than in Fig. 8. Consequently, the object edges can be detected more easily in the upper camera image (Fig. 7). Based on this, the geometry of the initial cylinder can be improved due to the better section geometry (compared to the reconstruction from the single image). This also reduces the deviations between the projection and the actual pipe in the right image and creates better conditions for the detection of the cylinder edges in the right image.

5 Detection of Elbows

While it is easy to select the linear edges in the images for the cylinder or flange, this is more difficult for elbows. The appearance of the elbow contours is manifold and depends on the perspective. However, since elbows usually do not occur alone, but are usually adjacent to straight pipe elements, the reconstruction can be facilitated by integrating the following information:

  • Diameter of the pipe or elbow

  • Start and end direction or axis

The only remaining unknown is the bending radius, on which in turn the start and end points of the elbow depend (Veldhuis and Vosselman 1998).

To determine the unknown bending radius, it can be modified until the projection of the reconstructed elbow coincides with the object edges in the image. The procedure can be facilitated by including standardised bending radii. In addition to the diameter, which is also specified for the elbow by DIN (2003), the bending radius is also specified in DIN (1999). Specifically, this is based on the diameter and is roughly 1, 1.5 or 2.5 times its diameter (D2, D3 and D5 elbows).

Automation is also possible for elbows. The prerequisite here is that the objects connecting at both ends have already been reconstructed. In the case of straight pipes, a connection through an elbow is only possible if both pipes or axes are (approximately) coplanar. In general, this means that there is coplanarity between the axes at the end points (resulting from the end point and the corresponding axis direction) of two existing objects. Thus, the search for potential positions for pipe elbows is also conceivable, for example, between a straight pipe and an already reconstructed elbow.

In addition to these mandatory conditions, there are also optional conditions which are not always fulfilled, but which can be used to speed up the search and avoid possible false detections. These include the distance between the imaginary intersection of the axes and the nearest pipe end point. This distance should be only slightly larger than the radius of the elbow. In addition, there should be no other collinear objects near the intersection point. This would otherwise indicate a T-piece.

Assuming an elbow according to DIN (1999), there are three possibilities for the appearance of the elbow at a detected point (resulting from the three possible bending radii). In order to verify the detected location and to find the suitable bending radius, a quality value \({q}_{b}\) can be determined for each elbow (type) b (Fig. 10) in analogy to the procedure for finding the suitable cylinder diameter. This quality value reflects the average distance between the projection and the object edges in the image according to Fig. 9 and Eq. 4. If the smallest value is below a threshold, the elbow is confirmed.

Fig. 10
figure 10

Proposed standard elbows with different bending radii as a clamped element between two pipes

If no standard elbow with normalised radius is available, the radius can also be determined individually. More details will follow in Sect. 7.3.

6 Determination of Flange Parameters

The reconstruction of flanges is possible in the same way. Since the flange is essentially a (short) cylinder from a geometrical point of view, a reconstruction analogous to Sect. 4 would be conceivable. However, the reconstruction of the flange on the basis of the straight edges of the flange face usually leads to very poor results or, concretely, to a tilting of the flange axis, since it turns out to be very short. However, defining the flange axis based on an adjacent element improves the calculation. In this case, the flange can then be defined by selecting a point on the straight edge of the flange face. The diameter or radius of the flange face is obtained via the distance between the skewed straight lines resulting from the associated image ray and the predefined flange axis. To determine the position and length of the axis segment of the flange, the edge is traced and extracted starting from the manually selected edge point with the help of the existing knowledge about the edge direction. By projecting the edge end points onto the flange axis, the position of the flange face is then obtained. Similarly as for the appearance of the elbows, this is also regulated for the flanges. Figure 11 shows some frequently occurring flange types from DIN (2018).

Fig. 11
figure 11

Selected flange types according to DIN (2018)

In addition to the distinction according to flange types, a further distinction can be made on the basis of the pressure class. The individual types are available in different pressure classes, whereby the pressure class determines the dimension (Fig. 12). Table 1 shows an extract of the flange dimensions using the example of a DN40 weld neck flange. Although DIN (1975a, b, c, d) are obsolete standards, components according to these standards can still be found in plants or also on the market. This is problematic in that the dimensions of the individual flanges within the same pressure class differ slightly in some cases, as Table 1 shows. This also shows that the dimensions of the flange do not necessarily allow conclusions to be drawn about its pressure class, as the dimensions are in some cases also identical across pressure classes.

Fig. 12
figure 12

Dimensions of the weld neck flange, from DIN (2018)

Table 1 Dimensions of a DN40 weld neck flange

The user can select the flange type and pressure class from an automatically generated list (based on the known outside diameter) and verify or adjust it using the associated live projection into the image. The position of the flange is adjusted when selecting type and pressure class so that the position of the flange face remains unchanged.

As an alternative to the method just presented, the possibility of attaching flanges to existing objects has also been implemented in the reconstruction software. Here, the flange starts at the existing end point.

In addition to the manual selection of the correct flange type and pressure class, automation has also been integrated here. This is also based on the degree of correspondence between the projection of the flange f (specified by the type and the pressure class) and the edges in the image (Fig. 13). Numerically, this is captured in the value \({q}_{f}\), which is determined in analogy to the elbow and according to Eq. 4.

Fig. 13
figure 13

Proposed flanges for axial reconstruction with manually set edge point

7 Adjustment Approach for the Optimisation of the Pipe Parameters

The calculation methods presented so far are of a purely geometric nature. Here, no object knowledge can be integrated or if, then at most indirectly. Depending on the position and course of the edges in the image, the result is, for example, a cone instead of a cylinder. Also, the determined dimensions are usually not equal to the values from the standards (for example, for the pipe diameter according to DIN (2003)). To improve the reconstruction quality, however, this knowledge can be integrated into the calculation. This model knowledge is taken into account in the least-square adjustment procedure according to Tangelder et al. (1999) described below.

7.1 Adjustment for Individual Objects

Within the adjustment, the parameters of the pipe objects (position, direction or rotation, and object-related parameters, such as the length or diameter) are estimated, where the result of the geometric reconstruction can serve as a initial value. Tangelder et al. (1999) fit the edges of the object to pixels with high grey value gradients. Therefore, they use a two-stage approach. First they use the distance to pixels with a high gradient in the range of the projected edge as observation, whereby they introduce the squared gradient values as weights. After that, they fit the prealigned edges of the object modelled as a Gaussian smoothed step edge to the image edges for the final iterations.

In contrast to Tangelder et al. (2000), we use only a one-step approach. We form the observations using the orthogonal distances \({d}_{i}\) between the edge of the object to the nearest edge pixel (at subpixel level) (Fig. 9). The calculation of the distances between the backprojected edges and the image edges is performed (similar to what has already been described in Sect. 4) in an arbitrary interval s along the backprojected edges (Fig. 9). However, if there is no edge pixel in the search area, the formation of an observation is omitted and the search continues at the next point on the backprojected edge at the interval s.

As a rule, the least-square adjustment can be terminated after 5–10 iterations and thus is faster compared to Tangelder et al. (1999), who quantify the number of iterations between 10 and 20. Similar to Tangelder et al. (1999), we use pseudo-observations to account for existing model knowledge. For example, we introduce the nearest pipe diameter according to DIN (2003) as an observation with a high weight.

The formulation of the functional model in closed form is difficult due to the complex and non-linear relationships. Among other things, this includes the calculation of the 3D points on the object geometry that create the visible edges in the image. Furthermore, also the mapping process into the image and finally the calculation of the distances in the image in orthogonal direction to the projection. However, due to the lack of an explicit functional relationship, the determination of the partial derivatives is also not possible in a direct way. Ermes et al. (1999), therefore, propose a stepwise, analytical calculation:

First, the change of an “edge-generating” point P (point on the 3D object that lies on a line with the projection centre and the associated edge pixel in the image) in the object space is considered. For example, an edge-generating point P on the cylinder experiences approximately the same displacement as the cylinder itself (Ermes et al. 1999). The shift vector \(\varDelta\) or the derivative \(\frac{\partial P}{\partial {t}_{x}}\) for the point P with respect to the translation in x-direction \({t}_{x}\) is thus:

$$\begin{aligned} \varDelta = \frac{\partial P}{\partial {t}_{x}} = \left( \begin{array}{c}1\\ 0\\ 0\end{array}\right) \end{aligned}$$
(5)

Similarly, for a point P on a cylinder whose axis corresponds to the z-axis, the derivative with respect to the radius r is obtained in the radial direction:

$$\begin{aligned} \varDelta = \frac{\partial P}{\partial r} = \left( \begin{array}{c}{P}_{x}/r\\ {P}_{y}/r\\ 0\end{array}\right) \end{aligned}$$
(6)

The projection of the shift vectors or derivatives from object space into the image plane with image coordinates u and v describes Lowe (1991):

$$\begin{aligned} \frac{\partial u}{\partial {p}_{j}}= -\frac{c}{{P}_{z}}\left( {\varDelta }_{x}-\frac{{P}_{x}}{{P}_{z}}{\varDelta }_{z}\right) \end{aligned}$$
(7)
$$\begin{aligned} \frac{\partial v}{\partial {p}_{j}}= -\frac{c}{{P}_{z}}\left( {\varDelta }_{y}-\frac{{P}_{y}}{{P}_{z}}{\varDelta }_{z}\right) \end{aligned}$$
(8)

Here, c is the camera constant and \(\varDelta\) is the shift vector calculated in 3D space or the derivative \(\frac{\partial P}{\partial {p}_{j}}\) after the parameter \({p}_{j}\). In the last step, Ermes et al. (1999) determine the part \({s}_{o}\) of the derivatives \(\frac{\partial u}{\partial {p}_{j}}\) and \(\frac{\partial u}{\partial {p}_{j}}\) orthogonal to the projection:

$$\begin{aligned} {s}_{p} = \frac{|e\cdot s|}{||e||^2}e \end{aligned}$$
(9)
$$\begin{aligned} {s}_{o} = s-{s}_{p} \end{aligned}$$
(10)

with

$$\begin{aligned} s =\left( \frac{\partial u}{\partial {p}_{i}},\frac{\partial v}{\partial {p}_{i}}\right) \end{aligned}$$
(11)

e is the direction of projection in the image at the corresponding image point. Thus, the derivatives \(\frac{\partial {d}_{i}}{\partial {p}_{j}}\) for the observed distances \({d}_{i}\) with respect to the (unknown) parameter \({p}_{j}\) after applying the sign functions results in:

$$\begin{aligned} \frac{\partial {d}_{i}}{\partial {p}_{j}} = sign({s}_{o,x}{e}_{y}-{s}_{o,y}{e}_{x}) {s}_{o} \end{aligned}$$
(12)

7.2 Common Adjustment with Conditions Between the Objects

In addition to the image data and the normalised values for the object-specific parameters, observations can also be formulated to maintain the conditions for the entire pipelines (Tangelder et al. 2003). This includes the component transition without axial offset as well as the directionally and diameter continuous transition (Ermes 2000). To model the transition without axial offset, the vector w is first determined using the end point \({E}_{1}\) of component 1 and the start point \({S}_{2}\) of the subsequent component 2:

$$\begin{aligned} w = {S}_{2} - {E}_{1} \end{aligned}$$
(13)

From the vector w result, separated by the coordinate axes, three observations (\({w}_{x}\),\({w}_{y}\),\({w}_{z}\)). These can be added to the observation vector alongside the “image-based” observations (cf. Sect. 7.1). The partial derivative \(\frac{\partial w}{\partial {t}_{2,x}}\) with respect to the unknown translation \({t}_{2,x}\) of component 2 in the x-direction thus results exemplarily to

$$\begin{aligned} \frac{\partial w}{\partial {t}_{2,x}} = \left( \begin{array}{c}1\\ 0\\ 0\end{array}\right) \end{aligned}$$
(14)

In addition, analogously the partial derivative \(\frac{\partial w}{\partial {t}_{1,x}}\) for the component 1 can be set to

$$\begin{aligned} \frac{\partial w}{\partial {t}_{1,x}} = \left( \begin{array}{c}-1\\ 0\\ 0\end{array}\right) \end{aligned}$$
(15)

Similarly, the directional deviations between two components are also formulated as observations. Depending on the weighting, the final geometry is oriented more towards the model concepts or the image information. For more information and especially the derivations, we refer to Ermes (2000).

7.3 Determination of the Elbow Radius

The determination of the unknown radius of an elbow has already been discussed in Sect. . If the radius is not based on the standards and is instead completely free, it can also be determined within the scope of an adjustment. Analogous to the Sect. , it is assumed that the connection elements of the elbow are known. Thus, it is obvious to remove the unnecessary elbow parameters (translation, rotation, bending angle and diameter) from the adjustment and to leave only the radius as unknown. However, this does not lead to the desired success, since in the default case the starting point S of the elbow is fixed, or can only be shifted via the translation parameters. Changing the radius would cause the end point E of the elbow to deviate from the reference axis (\(IP -E\)) of the connection element, as shown in Fig. 14a. It shows the shift vectors \(\varDelta\) or the partial derivatives \(\frac{\partial P}{\partial r}\) of a point P (or its projection on the elbow axis) according to the radius r. Therefore, we modified the functional relation accordingly so that the intersection point \(IP\) does not undergo any displacement due to the radius change. As a result, the starting point S and the end point E are also shifted only along the axes of the connecting elements (Fig. 14b).

Alternatively, the elbow can also be introduced into the adjustment with a full set of parameters. For the bending angle and the diameter, pseudo-observations are formulated with the existing approximated values or the normalised bending angle and diameter. These observations are given a high weight so that the associated unknowns effectively undergo no change. The freedom of the remaining translation and rotation parameters, respectively, is constrained by formulating additional observations that keep the starting point S and the E on the associated axes \(IP -S\) and \(IP -E\), respectively.

Fig. 14
figure 14

Shift vectors for elbow points due to a change in radius r

8 Automatic Transformation via Multiple Reconstructed Objects

Compared to laser scanners, which cover a 360\(^\circ\) panorama and thus a very large area, the measurement range of the three-camera measurement system is limited by the field of view of the cameras. However, primarily the geometries of individual components of the plant or pipeline are of interest (to manufacture the insulation). Nevertheless, especially in the case of larger plant components, it is necessary to capture the object from several points of view, which must then be merged by applying a transformation.

Typically, the transformation is carried out on the basis of control points. In contrast, it is also possible to perform the transformation on the basis of reconstructed objects that are contained in both datasets. For example, two pipes can be used as a minimum configuration. Ideally, these should be perpendicular to each other, since it is then sufficient if the pipe axis was only reconstructed as a partial section. In this case, the reconstructed pipes do not necessarily have to extend to the actual end point of the pipe or have an identical length.

8.1 Assignment of Corresponding Components

Before calculating the transformation, the correspondences between the two datasets or the objects they contain must first be established. In the simplest case, this is done manually by selecting identical objects.

Rabbani (2006) describes an automatic correspondence search based on reconstructed objects. He uses an assignment by applying RANSAC and checks or rejects the result by the geometric relations among them. To verify the final assignment, he applies a transformation to the underlying point cloud and evaluates the distances between the points of the two datasets.

Since the presented measuring system contains a inclination sensor, and the reconstructed objects are thus available in a roughly horizontal coordinate system, the transformation is essentially limited to the rotation around the vertical axis as well as the translation. The inclination values have to be corrected only minimally. For the assignment of the components, the approach of Rabbani (2006) is adapted and simplified as follows:

  1. 1.

    Random selection of a horizontally oriented pipe: a horizontal pipe is randomly selected in both datasets. The pipes must have the same diameter and the value of the direction vector in the “height component” must be approximately the same. If the pipes have connections at both end points, their length must also match.

  2. 2.

    Determination of the rotation around the vertical axis using both pipes: using the direction vectors of the pipes, the rotation around the vertical axis is determined.

  3. 3.

    Selection of another object pair: search for another pair of components (pipe, flange or elbow), where the axis or normal direction is approximately the same after applying the rotation. In addition, the direction should be as orthogonal as possible to the first pair.

  4. 4.

    Determining the translation: the translation of the transformation is determined based on the second pair of objects.

  5. 5.

    Final verification using other objects: more correspondences are searched and used to verify the correspondence and the transformation.

The procedure is repeated until the assignment is verified. If an assignment does not meet the criteria, the iteration is aborted.

To calculate the transformation, at least two (preferably orthogonal) objects must have been reconstructed and assigned.

8.2 Pre-transformation

After matching the component axes, the next step is to calculate a pre-transformation. This is based on two linked component pairs, whereby attention is paid to the largest possible extension or axis length in order to achieve a good approximation. In the case of rectilinear components, it must be noted that the end points of the component axis cannot be used directly, since these may represent only an axis section of the component. Therefore, the transformation is performed using artificially generated points. For this purpose, the two axes are intersected in particular in the reference dataset (\(Ref\)) and the dataset to be transformed (T), respectively. In the distance d from the intersection point s, two points t1 and t2 are then interpolated onto the axes (Fig. 15). The transformation is now calculated over the three points s, t1 and t2.

Fig. 15
figure 15

Generation of the points t1, t2 for the pre-transformation by interpolation on the axes l1 and l2 at the distance d

8.3 Final Transformation

The pre-transformation was only calculated on the basis of two components, each of which is affected by deviations due to the reconstruction process. These deviations in turn influence the transformation so that the pre-transformation can only be understood as a first approximation. Therefore, the transformation parameters are optimised within the scope of an adjustment, whereby now the information of all corresponding components is included. The goal of the adjustment is the minimisation of the distances between the component axes between both datasets.

The observations form the perpendicular distances \({d}_{P1}\) and \({d}_{P2}\) of the pre-transformed component endpoints to the axis of the corresponding reference object in the second dataset. Here, distances \({d}_{Pi}\) are again split into two vectors \({o}_{1,Pi}\) and \({o}_{2,Pi}\) to establish a linear relationship. The vectors \({o}_{1,Pi}\) and \({o}_{2,Pi}\) are perpendicular to each other as well as to the direction vector \({r}_{ Ref }\) of the cylinder axis (Fig. 16).

The unknowns form the transformation parameters. Specifically, a translation vector and the rotation are estimated, the latter being represented singularity-free as a quaternion \(q({q}_{x},{q}_{y},{q}_{z},{q}_{w})\). While an Euler rotation involves rotation about the three coordinate axes, a quaternion performs a rotation about only one (arbitrary) axis. To avoid overparameterisation, the quaternion is normalised:

$$\begin{aligned} {q}_{x}^2+{q}_{y}^2+{q}_{z}^2 +{q}_{w}^2=1 \end{aligned}$$
(16)

For the partial derivatives with respect to the translation \(\frac{\partial {o}_{i,Pj}}{\partial tx}\) at point i in direction j the normalised distance vector \({n}_{i,Pj}\) is required (Eq. 18). For the x-direction of the translation, the partial derivative is obtained for example as follows:

$$\begin{aligned} \frac{\partial {o}_{i,Pj}}{\partial {t}_{x}} = {{n}_{i,Pj}}_{x} \end{aligned}$$
(17)

with

$$\begin{aligned} {n}_{i,Pj} = \frac{{o}_{i,Pj}}{||{o}_{i,Pj}||} \end{aligned}$$
(18)

The calculation of the partial derivatives of the observations with respect to the elements of the quaternion \({q}_{k}\) is done successively. First, the partial derivative of the endpoints \({P}_{j}\) is determined with respect to the elements of the quaternion \({q}_{k}\):

$$\begin{aligned} \frac{\partial {P}_{j}}{\partial {q}_{k}} = \frac{\partial q}{\partial {q}_{k}} {P}_{j} \end{aligned}$$
(19)

Subsequently, \(\frac{\partial {P}_{j}}{\partial {q}_{k}}\) is projected onto the normalised distance vector \({n}_{i,Pj}\) (Eq. 18) to obtain the partial derivative of the observations:

$$\begin{aligned} \frac{\partial {o}_{i,Pj}}{\partial {q}_{k}} = \frac{\partial {P}_{j}}{\partial {q}_{k}} {n}_{i,Pj} \end{aligned}$$
(20)
Fig. 16
figure 16

Determination of the distance vectors for an axis-based transformation

9 Results and Evaluation

In the following, the presented approaches and methods are evaluated both for their accuracy and robustness in terms of detection rate. With respect to accuracy, there is quite less information to be found in previous work cited in Sect. 2. Primarily, they focus on automatic detection and detection rate. Therefore, for accuracy analysis, the comparison with the nominal geometry is performed. For this purpose, the test dummy (Fig. 17) was scanned using the Leica Absolute Tracker AT960 in combination with the T-Scan 5 (Hexagon 2021), resulting in an MPE of 50\(\mu\)m for a measured single point. The components were fitted into the point cloud using PolyWorks Inspector (InnovMetric 2023) and own algorithms. For the transition into the coordinate system of the camera measurement system, seven control points were scanned (Fiedler et al. 2019) and a transformation was calculated based on these points. The mean 3D residual was 0.4 mm.

Fig. 17
figure 17

Measurement constellation with pipeline and control points for analysing the reconstruction accuracy and the transformation quality

9.1 Pipe Reconstruction

9.1.1 Reconstruction Accuracy

For the cylinder detection, the possible pipes are first calculated based on a single image using the pipe diameters from the DIN (2003). In Table 2, the deviations of seven pipes from the nominal geometry (from the laser tracker measurement) are shown. The reconstruction is based on a single image. The object distance is about 3.6 m and the pipe diameter 48.3 mm. On average, the position deviation is 46 mm (relative deviation: 1.29%) and the direction deviation is 3.6\(^\circ\). Similar values are achieved by Doignon and de Mathelin (2007), who also reconstruct cylinders with known radius based on a single image. In general, due to the poorer section geometry, the reconstruction of thin pipes at large distances is subject to larger deviations.

Table 2 Deviations between the single image reconstruction and the nominal geometry

The deviations between the automatically reconstructed cylinders and the nominal geometry can be found in Table  3. The cylinder parameters from the single image reconstruction were further improved by integrating the edges in the other two images, and the cylinder was then subjected to an adjustment according to Sect. 7.1. The improvement is also noticeable in the deviations. The mean position deviation is 2.3 mm (relative deviation: 0.06%) and the directional deviation is 1.1\(^\circ\).

Table 3 Deviations between the automatically calculated pipes (using the three images) and the nominal geometry

Another improvement is obtained after common adjustment (cf. Sect. 7.2) of all objects of the pipeline (Table  4). The mean position deviation from the nominal geometry is 1.1mm (relative deviation: 0.03%) and the direction deviation is 0.3\(^\circ\). For comparison, Nguyen and Choi (2018) also achieves directional deviations of 0.2\(^\circ\)-0.3\(^\circ\) with a point cloud-based fitting. A similar result is obtained by Rabbani (2006) using a point cloud (standard deviation of axis direction: 0.24\(^\circ\)). In addition, the distance deviation is in the range of 1–2 mm. In contrast, in his case photogrammetry with three images performs worse with respect to both parameters (direction & position) by about a factor of 5. Tangelder et al. (1999) reaches accuracies of 1 mm and below for the position and 0.1–0.5\(^\circ\) for the direction. In general, the recording geometry should always be considered when comparing accuracy. Depending on the intersection angle and acquisition distances, different results will be obtained.

Table 4 Deviations between the pipes after common adjustment and the nominal geometry

9.1.2 Recognition Rate

To analyse the detection rate, the method was applied to different images (from industrial plants and test environment) and pipes. In total, it was tested on 85 pipes. Table  5 presents the results. 76% of the pipes were correctly detected. Besides, there were also cases where no pipe or the wrong pipe was found. Especially in noisy or bad illuminated images the algorithm has problems. Accumulations of linear structures or objects are currently also a challenge. In such cases, false detections tend to occur. In connection with the poorer initial reconstruction on the basis of a single image in the case of thin, distant pipes, the extraction in such cases tends to be more difficult or more error-prone. The results, and thus also the detection rate and the proportion of false detections, can be influenced by the following parameters:

  • Size of the search area for edge pixels in the other images after the single image reconstruction.

  • Deviation between the reconstructed pipe diameter and the nearest standard diameter (reconstruction over three images)

  • Maximum radius RMS (in object space) resulting from the edges in the images

However, these parameters were chosen in such a way that, despite some false detections, the highest possible number of pipes is correctly detected. Overall, the proportion of correctly detected pipes is significantly higher compared to the brute-force approach. Thus, this method is a considerable labour-saving solution. Object recognition based on machine learning can further increase the recognition rate (e.g. 95% according to Wang et al. (2022)) and fully automate the reconstruction. However, the training is much more time-consuming. It may also be necessary to fine-tune the training data for the specific use case. The presented approach, on the other hand, can be used universally for cylindrical objects (with known diameter range).

Table 5 Recognition rate of pipes

9.2 Bow Reconstruction

In contrast to the reconstruction of pipes, the adjacent components must already have been reconstructed for the detection of elbows. If this is the case, however, the detection works fully automatically. The extraction can be influenced by the following parameters:

  • Maximum skew distance of the adjacent object axes

  • Maximum deviations of the adjacent object axes from the standard angle (90\(^\circ\),45\(^\circ\), etc.)

  • Linear distance of the connecting objects to the intersection point

The geometry of the elbow depends, except for the radius, exclusively on the neighbouring elements. A dependence between the recognition rate and the distance is, therefore, not given. The detection of the radius can be regarded as unproblematic due to the defined position of the elbow. Since the edge pixels are used to validate the presence of the arc at the corresponding position, there are significantly fewer false detections.

9.3 Flange Reconstruction

Similar to the pipes, the nominal geometry for the flanges was calculated based on the laser tracker measurement and compared with the photogrammetrically reconstructed flanges.

The flanges are first calculated axially to the automatically extracted pipes. Thus, the flanges get the same deviations orthogonal to the axis and with respect to the axial direction (mean value: 2.1 mm and 1.9\(^\circ\)) as the pipes (Table  6). The axial deviation (mean: 0.9 mm) depends on the manually selected position.

Table 6 Deviations between the flanges calculated axially to the automatically detected pipes and the nominal geometry

After the common adjustment, the deviations are reduced by about 50% (Table 7). On average, the deviations are 0.4 mm, 1.1 mm and 0.9\(^\circ\) (axial, orthogonal, direction). Accuracy consideration for flange reconstruction is found very rarely in literature. Maalek et al. (2019) achieves deviations from the nominal geometry of 2–4 mm with respect to position and 0.5\(^\circ\) with respect to direction for a point cloud based fitting of the flange. However, they describe the modelling of the flange as a cylinder. They do not carry out a recognition of the flange type.

Table 7 Deviations between the flanges after common adjustment and the nominal geometry

Detection of the correct flange type depends primarily on the visible edges in the image. The best constellation is, therefore, an orthogonal course of the flange axis to the imaging direction. It is more difficult in the case of occlusions. If, for example, the neck of the weld neck flange is partially covered, no associated edges will be found in this area and the score for the weld neck flange will be lower. In addition, the position and alignment of the flange is also crucial. If any deviations occur here, the wrong flange type may be determined. Another complicating factor is that the appearance of the individual flange types differs only minimally. The differences in the various types of flanges and the associated projections are much smaller than in the case of elbows. Compared to the pipe and elbow detection, the flange detection works parameter-free with regard to the settings.

9.4 Automatic Transformation

Rabbani et al. (2007) also use a transformation based on reconstructed objects. However, they do not minimise the distances between the components, but directly the deviations between the parameters of the objects. Thus, the individual objects get the same weighting. By using the distances on the other hand, a weighting based on the object length takes place. Longer pipes get a higher weight. They also have a higher accuracy compared to shorter pipes. Alternatively in the approach of Rabbani et al. (2007), the cylinder lengths could be introduced as weights.

The result of the transformation of three viewpoints using the example of a dataset taken in an industrial plant can be seen in Fig. 18. A total of nine images were taken from three viewpoints. The reconstruction was done separately for each viewpoint. Subsequently, the reconstructed objects were merged by transforming two viewpoints. The red dataset served as a reference and did not undergo any transformation. For the green dataset, the transformation resulted in a maximum deviation (between the axes) of 5.5 mm and the corresponding RMS value to 2.5 mm. For the transformation of the blue dataset, analogous values of 4.2 mm and 1.4 mm resulted.

Fig. 18
figure 18

Reconstructed objects from three points of view after applying a transformation

Since the components themselves are affected by deviations due to the reconstruction, the exclusive consideration of the residuals or the orthogonal distances after the transformation is only of limited significance. For a more comprehensive analysis, the quality of the transformation was also evaluated on a second dataset using control points. The test object was captured and reconstructed from two viewpoints. In addition to the pipeline, the images also contain control points (Fig. 17). These have not been changed with respect to the pipeline between the two point of views. The transformation was calculated based on the pipeline and the underlying objects, respectively, with a maximum deviation of 2.4 mm and a mean deviation of the axes of 0.8 mm. The reconstructed control point coordinates were not used for the transformation. Instead, the transformation was applied to the control points so that the residuals between the transformed points from dataset A and the original control points from dataset B can be used to evaluate the quality of the transformation. The deviations at the control points (Table  8) also confirm the expected accuracy level. As assumed, the deviations in the z-direction (depth) are significantly larger than in the x- or y-direction (transverse direction). The deviations resulting from the reconstruction process, which are largest in the depth direction due to the height-to-base ratio at larger distances, thus propagate to the determination of the transformation parameters. In both cases, however, the deviations are quite acceptable or usable and comparable with a control point-based transformation.

Table 8 Deviations between nominal and actual control point coordinates after transformation based on the pipeline

For the sake of completeness, it should be mentioned that the calculation of the control points is of course also subject to deviations, whereby these in turn also influence the coordinate comparison and the accuracy analysis. These deviations are, however, of subordinate order of magnitude (Table 9). Thus, a second transformation on the basis of the control points resulted in a mean 3D residual of 0.4 mm (Table  10). Here, a worse accuracy in depth direction (z) results from the intersection angle, too.

Table 9 Standard deviation of the reconstructed control point coordinates
Table 10 Mean residuals of the control point-based transformation

Generally, a more or less large extrapolation occurs during the transformation. This depends on the acquisition configuration, the spatial extent of the doubly acquired objects that serves as the calculation basis for transformation and the dimension of the reconstruction and transformation area. Thus, in general, the accuracy of the transformation can only be roughly estimated. However, the problem of extrapolation is generally also present with a control point-based transformation. Thus, longer pipelines may have larger deviations due to the error propagation. For the present application, however, this is generally less relevant. The focus is primarily on the geometry of the components and the positional relationship of neighbouring components (for manufacturing the insulation). When applying the Level of Accuracy (LOA) according to U.S. Institute of Building Documentation US Institute of Building Documentation (2016), LOA40 (1–5mm) is to be selected in this context. The localisation in the plant is primarily interesting for logistic reasons or assembly, whereby a lower accuracy e.g. LOA20 (15–50 mm) or even LOA10 (> 50 mm) would be sufficient for this. During assembly, deviations caused by the transformation can be compensated by the tolerances of the manufactured insulation (sheet metal shell).

Provided that parts of the plant have already been recorded and stored in a BIM, the transition to a plant coordinate system can be carried out, transforming via these objects.

10 Conclusion and Outlook

In the paper, different techniques for the automation of edge-based reconstruction were presented. Depending on the object type (e.g. elbow) and if the general conditions are suitable (required neighbouring components available), a fully automatic reconstruction is possible. For straight pipes, which in contrast to pipe elbows are calculated independently, a fully automatic extraction and calculation with the classical computer vision techniques is not very useful. Nevertheless, for reconstruction, manual object selection in a single image is sufficient in most cases, so that the workload is halved compared to a stereo evaluation. Using standardised dimensions, the object parameters for dimension and appearance can usually be derived without user assistance. This makes the software more user-friendly and the technology as a whole accessible to less trained personnel.

If fully automatic reconstruction is desired for all object types and for any configurations, however, techniques from the field of deep learning should be focussed on. The selection of the individual components in the images could be performed using instance segmentation. For the reconstruction of the objects, the resulting image sections or masks are (in simplified terms) intersected three-dimensionally. In contrast to the presented “classical” computer vision techniques; however, the training of the neural network is much more time-consuming. In addition to obtaining a large amount of training data, a lot of time is also required for pixel-precise labelling of the images. At the current time, it is also unclear how the recognition rate will turn out in the case of strong occlusions and/or poor lighting conditions and in what way partially detected objects can be used for reconstruction.

Extensive tests of the system in industrial plants could not yet be carried out. Nevertheless, the system also proved to be suitably robust in smaller field tests and under poor lighting conditions. Although the transformation is based on error-prone reconstruction data and thus subject to error propagation, the average residuals turn out to be very small and are quite comparable to those of a control point-based solution.