1 Introduction

Modular Digital Imaging Total Stations show a wide range of experimental applications in the fields of engineering surveying and metrology (Atorf et al. 2019; Guillaume et al. 2016; Wagner et al. 2014). Recent reviews have been presented by Paar et al. (2021) and Zschiesche (2022). In short, the development of surveying instruments into more powerful and user-friendly tools is taking place through automation and the addition of new sensor technology. This can be seen, for example, in automatic target recognition (ATR) or the autofocus of so-called total stations. By extending a total station with one or more cameras, the possibilities of image processing also become available. In addition, a further advantage is the users’ independence due to the subjective sense of the observer’s eye. However, the cameras used so far by instrument manufacturers primarily serve to improve interactive user workflows, but they currently do not provide a real-time interface for user-specific image analysis or deep learning applications in the measurement process. Furthermore, the highest possible frame rate is significantly lower than the frame rate achievable by industrial cameras. For example, a multistation MS50 from the manufacturer Leica/Hexagon achieves 20 frames per second from a 5 MP coaxial camera and thus allows smooth user interaction. Looking into detail, the frame rate of 20 Hz is only achieved with respect to VGA resolution of the display, 640 × 480 pixels (Grimm and Zogg 2013). Saving a full 2560 × 1920 pixel image to an SD card usually takes more than 2 s with JPEG compression and even more than 6 s in raw format. To be able to target applications that we believe require frame rates of around 1 Hz–1000 Hz, we have decided to continue the concept of external cameras to achieve these frame rates (Hauth et al. 2013), while integrating the motorised focus support of the multistations. We consider it an advantage that the external camera does not disturb the thermal design of the multistation even at high pixel clock rates.

During the process of prototype development, different types of construction emerged. External implementations make it possible to mount the camera on the ocular or replace it. These are used in combination with commercial total stations or tacheometers and can be converted and adapted to the particular conditions and requirements. One example of such a modular system is DAEDALUS of ETH Zurich (Bürki et al. 2010; Charalampous et al. 2014; Guillaume et al. 2012, 2016). In this concept, a CCD chip replaces the eyepiece. The camera does not capture the crosshairs in the image. No additional optical component is added in between. This makes it necessary to attach a meniscus lens to the front of the telescope for distances of 13 m or more. This front lens shifts the focal plane to the image sensor for focused images. Similar to Huang and Harley (1989), the calibration is carried out by virtual control points, where the central projection is expressed by an affine approach. Instrument errors are not taken into account.

The developed application of the University of Zagreb attached a GoPro5 directly to the ocular (Paar et al. 2017, 2021). Another setup where the camera is directly attached to the eyepiece can also be found in Schlüter et al. (2009). For the measurement with the IATS at the University of Zagreb, videos are recorded which are later split into images. Here, photo targets with predefined circles of known diameter and distance between the circle centres are used. This enables the evaluation of the image data. For the frequency analysis, raw image coordinates are used and no camera calibration is required. However, the photo target must be attached to the object to be observed.

The second design offers the advantage of a fixed camera with the instrument like commercial IATS. Commercial IATS often have too low speed of image acquisition for kinematic measurements (e.g. for frequency analysis in structural health monitoring). The fixed camera provides constant calibration parameters as opposed to the modular version which requires calibration after reconfiguration. An early prototype is mentioned in Walser (2004), and the prototype series IATS2 from the manufacturer Leica in Reiterer and Wagner (2012), Wagner et al. (2013, 2016), Wasmeier (2009b). Walser (2004) describes the camera with an affine chip model and uses a combined approach to take camera and instrument errors into account. Wasmeier (2009a) shows a comparison of different methods.

The measuring system MoDiTa developed at i3mainz extends an existing instrument modularly by an external industrial camera. The self-calibration based on the photogrammetric camera model fully integrates the external camera into the measurement process. By permanently tracking the crosshair, the accuracy characteristics of the total station are maintained. In the following, we explain the measurement system and the calibration. The necessary image-based acquisition of the crosshair for the calibration and the further measurement process will be discussed in the following in more detail. The approach used here shows how a calibration can be calculated flexibly on site using software and various cameras and total stations (compatible to TCA, TPS, TS and MS series from the manufacturer Leica) without any additional equipment.

2 Measurement System

The Modular Digital Imaging Total Station (MoDiTa) combines a high-end industrial camera with a digital total station in a modular and flexible way and is currently on prototype level (Fig. 1). As described in Hauth et al. (2013), the standard eyepiece of the total station is replaced by an industrial camera via a bayonet ring. To balance the weight of the camera, we attached a counterweight to the telescope. The cameras can be mounted in any rotation around the target axis by means of a simple clamping screw. By means of a corresponding adapter, the eyepiece camera used takes images directly from the crosshair plane. The crosshair is thus captured in every image. Among other things, this enables automatic, image-based targeting, which is within the accuracies of the total station (standard deviation according to ISO 17123-3 2001). With the help of template matching, non-signalled distinctive features are captured without contact. The use of the total station’s motorised autofocus is advantageous because, among other things, it enables simple self-calibration. After self-calibration, we calculate the corresponding horizontal or vertical angle for each point of interest in the image.

Fig. 1
figure 1

The upper pictures show the ready-to-measure system MoDiTa in combination with a multistation MS50. The picture below shows the schematic structure of the eyepiece adapter for attaching the digital camera with the optics. The optics are attached to the eyepiece holder via an S-mount connection. This holder is connected to the total station via a bayonet connection for the eyepiece. The length of the eyepiece holder determines the magnification and thus how much of the crosshair is imaged onto the sensor. The digital camera is attached to the camera mount via a C-mount or CS-mount connection

Due to the modular design of the measuring system, a camera can be selected depending on the respective project requirements. Project requirements might include:

  • a monochrome, NIR or RGB (Bayer pattern) sensor,

  • low light suitability (usually by large pixel pitch) or high resolution,

  • a global or rolling shutter,

  • availability of a hardware trigger,

  • availability of line scan modes,

  • high frame rate (frames per second).

The industry standard C-mount used makes it easy to replace components. Depending on the industrial camera used, images can be captured in different modes. By selecting an area of interest (AoI), the range of captured lines and columns can be defined. In line-wise mode, only one line is captured over the width of the image. The data to be transmitted can thus be reduced, enabling a higher image capture frequency. A more detailed description can be found in Hauth et al. (2013).

3 Self-Calibration

To obtain measurement results within the measurement accuracy of the total station, calibration of the entire system is required. This is done by self-calibration directly on site. Given the speed of the self-calibration process, we do not intend to achieve repeatability of the calibration parameters of the camera in different setups. The aim is rather to be able to use the measuring system quickly and in an application-oriented manner. The determination of interpolable parameters for a particular combination of camera and total station was never attempted.

The user installs or replaces the camera on site and the measurement can be continued after calibration. Due to the simple mounting of the camera and the modular design, it is near impossible to recreate an identical setup. As a result, there are minimal differences in the optical path for each setup. Differences of several pixels in the image are possible.

Calibration is mainly carried out automatically and only needs to be operated manually by the user at the beginning. Before calibration, it is necessary to detect the crosshair to provide a reference image of the crosshair. The crosshair reference image ensures consistency of visual aiming through the eyepiece to camera-based aiming. Furthermore, the reference image of the crosshairs is used to correct any camera movements computationally, cf. Sect. 4. The telescope is moved relative to a fixed target point in such a way that the target point is imaged at favourably distributed locations on the image plane (Schlüter et al. 2009). This allows for the collection of data for an overdetermined linear system of equations. The software provides for different patterns with different distributions of the observation points in the image. The selection of patterns makes it possible to open up new fields of application in an applied, scientific environment by means of an inexpensive measuring system. For example, a comprehensive high-precision calibration with up to 36 measurements can be carried out for an investigation into atmospheric refraction. The implemented maximum number of observation points in the image is set to 9 points per quadrant (4 quadrants × 9 observation points = 36). It is possible to define fewer observation points, thus reducing the over determination. For a simpler example, see Fig. 11a. After the measurement, we have 12 images, each with the target at different positions in the image. In this case, we use a black and white laser-scanning target with a checkerboard pattern. For the automatic, rough approach of these target directions, the knowledge of a rough start transformation is sufficient, which only includes the camera constant and the rotation of the camera coordinate system around the target axis of the total station. We merely tilt the telescope to the side by a small, fixed amount to determine the start transformation. During the measurement, the software continuously observes the crosshair, the so-called matching. Due to the simple mounting of the camera, the crosshair is not in the centre of the image. It is also possible to rotate it around the optical axis.

In the context of self-calibration, we calculate the parameters via parameter estimation based on the least squares method. The functional model is based on the mapping relations between sensor space and object space (Walser 2004).

Optical distortion and a possible tilt of the camera are compensated by the distortion approach according to Luhmann et al. (2020). With c as the camera constant for the entire optics, the unknown angles \(\widetilde{H}\) and \(\widetilde{V}\) for the target. \(\kappa\), c and the photogrammetric radial, tangential and asymmetric distortions (A1, A2, A3, B1, B2, C1, C2) are obtained. We describe the illustration of the camera chip by a 2D transformation using a photogrammetric distortion model. \(\Delta x^{\prime}\) and \(\Delta y^{\prime}\) represent the parameters of the distortion, \(x^{\prime}_{0}\) and \(y^{\prime}_{0}\) represent the principal point, respectively, and the detected crosshair. The angle readings to the fixed target are not measured directly, but result from the pixel coordinates of the reference crosshair. The index P represents the measured values to the target point.

The unit vector to the searched target point is formed from the total station readings and the pixel position of the image point of one measurement:

$$\left[ {\begin{array}{*{20}c} {x^{\prime}} \\ {y^{\prime}} \\ c \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {x^{\prime}_{P} -x^{\prime}_{0} -\Delta x^{\prime}} \\ {y^{\prime}_{P} -y^{\prime}_{0} -\Delta y^{\prime}} \\ c \\ \end{array} } \right]$$

The corresponding point in object space is

$$\left[ {\begin{array}{*{20}c} {\overline{X}_{P} } \\ {\overline{Y}_{P} } \\ {\overline{Z}_{P} } \\ \end{array} } \right] = D\, R_{{IP_{P} }} R_{{IA_{P} }} {R_{{IP_{P} }}}^{T} R_{{H_{P} }} R_{{V_{P} }} R_{K} \frac{1}{{\left| {\left[ \ldots \right]} \right|}} \left[ {\begin{array}{*{20}c} {x^{\prime}} \\ {y^{\prime}} \\ c \\ \end{array} } \right],$$


$$\left| {\left[ \ldots \right]} \right| = \sqrt {x^{^{\prime}2} + y^{^{\prime}2} + c^{2} } .$$

The total vector in image space is normalised to unity, which is indicated by the division by \(\left| {\left[ \ldots \right]} \right|\). The spatial distance D is actually not required for the calibration. D prolongs the unit vector to the object point.

The matrix RK describes the rotation of the camera sensor around the optical axis.

$$R_{K} = \left[ {\begin{array}{*{20}c} {\cos \kappa } & {-\sin \kappa } & 0 \\ {\sin \kappa } & {\cos \kappa } & 0 \\ 0 & 0 & 1 \\ \end{array} } \right] $$

\(R_{{H_{P} }}\) and \(R_{{V_{P} }}\) follow from the graduated circle reading. The required values are supplied by the total station. The matrices describe the necessary rotations to transform the direction vector from the system of the total station into a coordinate system of its ancestries. Thus, when the instrument is previously stationed, the coordinates are converted into the used system directly.

$$R_{{H_{P} }} = \left[ {\begin{array}{*{20}c} {\cos H_{P} } & {-\sin H_{P} } & 0 \\ {\sin H_{P} } & {\cos H_{P} } & 0 \\ 0 & 0 & 1 \\ \end{array} } \right]$$
$$R_{{V_{P} }} = \left[ {\begin{array}{*{20}c} 1 & 0 & 0 \\ 0 & {\sin V_{P} } & {\cos V_{P} } \\ 0 & {\cos V_{P} } & {-\sin V_{P} } \\ \end{array} } \right]$$

\(R_{{IP_{P} }}\) and \(R_{{IA_{P} }}\) describe the rotation of non-compensator corrected total station readings into compensator corrected ones (IP and IA are calculated from the inclination in the direction of the target axis and transversely to the direction of the target axis). \(\overline{X}_{P}\), \(\overline{Y}_{P}\) and \(\overline{Z}_{P}\) are calculated and denote the normalised direction vector to the target point.

By using (7) and (8), the angles \(\overline{H}_{P}\) and \(\overline{V}_{P}\) can be calculated.

$$\overline{H}_{P} = {\text{atan}}\left( {\frac{{\overline{X}_{P} }}{{\overline{Y}_{P} }}} \right),$$
$$\overline{V}_{P} = {\text{atan}}\left( {\frac{{\overline{X}_{P} /\sin \overline{H}_{P} }}{{\overline{Z}_{P} }}} \right)\;{\text{for}}\;\left| {\sin \overline{H}_{P} } \right| > \left| {\cos \overline{H}_{P} } \right|,$$


$$\overline{V}_{P} = {\text{atan}}\left( {\frac{{\overline{Y}_{P} /\cos \overline{H}_{P} }}{{\overline{Z}_{P} }}} \right).$$

The different telescope positions HP, VP result in the corresponding image point \(x^{\prime}_{P}\),\(y^{\prime}_{P}\). From Eq. (2) follows \(\overline{H}_{P}\), \(\overline{V}_{P}\) to the target point. We use this concept for self-calibration. This means that the target point does not have to be aimed directly, but introduced as an unknown \(\tilde{H}\) and \(\tilde{V}\).

$$\overline{H}_{P}- \tilde{H} = 0 + v_{{\overline{H}_{P} }} $$
$$\overline{V}_{P}- \tilde{V} = 0 + v_{{\overline{V}_{P} }} $$

We calculate residuals through a summary modelling of all stochastic influences. Due to practical reasons, the stochastic portions of image coordinates, circle readings and compensator readings are not modelled separately from each other. Atmospheric flicker can be reduced by grouped multiple exposures suitably.

Here, the corrected tachymeter readings to the target point correspond to the direct measurement to the target. We did not differentiate between total station and camera-related corrections.

\(x^{\prime}_{{{\text{Ch}}}}\) and \(y^{\prime}_{{{\text{Ch}}}}\) represent the pixel position of the crosshair. Distortion and \(\kappa\) have no effect on the principal point.

$$R_{{\tilde{H}}} R_{{\tilde{V}}} R_{K} \frac{1}{{\left| {\left[ \ldots \right]} \right|}}\left[ {\begin{array}{*{20}c} {x^{\prime}_{Ch} { }- x^{\prime}_{0 }- 0} \\ {y^{\prime}_{Ch}- y^{\prime}_{0}- 0} \\ c \\ \end{array} } \right] = R_{{\tilde{H} }} R_{{\tilde{V}}} \left[ {\begin{array}{*{20}c} 0 \\ 0 \\ 1 \\ \end{array} } \right]$$

We consider measurements independently and equally accurately. The weight matrix P = I is defined with the ones on the main diagonal or zeros if the measurement is not to be included in the equation as an error.

The Cholesky factorisation according to Förstner and Wrobel (2016) is used to solve the system of normal equations. The normal equation matrix is split into an upper and lower triangular matrix C and CT. We solve the system of normal equations by subsequent forward and backward substitution. This saves computing time because instead of the entire normal equation matrix N, only one triangular matrix needs to be inverted (Förstner and Wrobel 2016; Luhmann et al. 2020).

From the linear dependent residuals of the unknowns follows the estimation of the unknowns as

$$\hat{x} = \left( {A^{T} Pl} \right)^{ - 1} A^{T} P l.$$

The unknowns and the termination criterion are calculated per iteration. Termination occurs after the limit has been reached.

$$x^{T} A^{T} Pl = l^{T} Pl - v^{T} Pv < { }0.00000001$$

As a result, the compensated direction angles to the target are provided. A transformation into Cartesian coordinates can be done afterwards by a distance measurement, if required. For this purpose, the determined target point is directly entered by the total station and a reflectorless measurement is carried out. The measured distance is used to extend the unit vector to the target point.

4 Crosshair Tracking

Based on the modular adapter for mounting a camera, it is possible to capture the crosshair. The crosshair is a geodetic crosshair that is not located in the exact centre of the image. We distinguish between detecting and matching. Detection of the reference crosshair should take place as soon as possible after the camera is mounted. As with manual eyepiece adjustment, a monotone image background is preferred for this step, e.g. a sky or grossly out-of-focus image, so as to get an even background. The position and orientation of the crosshair in the pixel coordinate system of the camera is determined with the help of further crosses, which we refer to as (virtual) réseau crosses in the following. The software continuously observes the position of the crosshair during further measurement. Any image coordinate is transformed to the reference crosshair using 2D transformation including two translations and one rotation. Smaller deviations are recognised and taken into account by matching. If the change is too large, a new detection is necessary. Figure 9 provides a simplified overview of the single steps. In the following, we distinguish between the inner and the outer crosshair. The two inner lines mark the centre of the crosshair. The outer crosshair is composed of the six outer lines of the geodetic crosshair (Fig. 2). The elaborate modelling of the reference crosshair makes it possible to ensure a largely continuous tracking later on, even if line elements are only recognisable in parts.

Fig. 2
figure 2

Geodetic crosshair with pixel coordinate system. Also shown is the distinction between inner (red) and outer (green) crosshair

4.1 Crosshair Detection

During crosshair detection, first the outer crosshair with its six lines will be roughly determined. According to Steger (1996a, b, 1998), a Gaussian smoothing filter in combination with its partial directional derivatives is applied. According to Canny (1983), the smoothing and the threshold values are determined. If the value of the second partial derivative of an image point exceeds the upper threshold in a pixel, this is detected as a line point with sub-pixel accuracy. If the second partial derivative is smaller than the lower threshold, we dismiss the pixel. If the value is between both thresholds, the pixel is only used if it can be connected by detected line points. As a result, we obtain several line segments. These are then examined for false detections according to Haralick and Shapiro (1992) or Suesse and Voss (1993). By calculating a regression line through the image points of the lines, the mean distance of the individual points to the line can be calculated. We reject points with greater distance than the mean. The regression line is then calculated again. By defining a limit value for the direction difference and the distance of the end points of neighbouring lines, these are merged if necessary (Fig. 3). The regression line is then calculated again. This is repeated until the maximum values for the direction difference and the distance of the line end points are no longer undercut. The longest six lines of the calculations correspond to the outer geodetic crosshair. These are still in the form of polylines. They are calculated individually by adjustment according to Haralick and Shapiro (1992), Suesse and Voss (1993) as a straight line equation. In addition, the line width is determined for later use in the precise determination of the crosshair. Simplified, we calculate the width of the longest line by edge detection. Starting from a point on the line, we search the perpendicular distance on both sides up to the edge. The length of the perpendicular is determined for each pixel on the vector line. The mean value then gives the line width. The contour width can differ depending on the camera and the total stations crosshairs, but it must be at least one pixel wide in order to be detectable.

Fig. 3
figure 3

Detection of the outer crosshair according to Canny (1983), Steger (1996a, b, 1998), and Suesse and Voss (1993). Solving the ambiguities using an example of an end of a line. a Line detection. b Merging lines based on limits. c Discarding falsely detected lines

The results are the start and end points, the straight line equation of the six crosshair lines and the line width.

We use the six compensated lines of the outer crosshair to determine the rough crosshair centre. All intersections of the straight lines are formed. We eliminate negative coordinates. By forming the median of the nine intersections, the rough centre is calculated. The maximum distance from the rough centre to the intersection points and the approach of an isosceles triangle are used to calculate the distance between two parallel crosshair lines.

For the definition of the inner crosshair, we define a circle with the centre equal to the roughly determined centre and the radius equal to the distance between two parallel crosshair lines. According to Luhmann et al. (2020), one-dimensional grey value profiles are formed vertically to the circumference.

These are obtained by averaging all existing grey values of a line that lies vertically to the circular ring. Using a Gaussian smoothing filter and a Laplace operator, we are able to calculate the edge positions with sub-pixel accuracy. We compare the edge amplitudes with a previously defined threshold value. If the amplitude is greater than the threshold value, an edge is present at the corresponding image position. A total of eight points are detected on the inner crosshair, two points per line (Fig. 4a). If more than four lines intersect with the circular ring, only the best four edges are used. The selection is made via the calculated edge amplitude. The greater the amplitude, the higher is the contrast of the contour at the image position. The start and end points of the edges per line are calculated as a mean so that there is one point for each inner line of the crosshair (Fig. 4b). For the orientation of the cross, the direction angle from the rough centre to the point with the largest edge amplitude is used.

Fig. 4
figure 4

Rough detection of the inner crosshair. a Definition of an intersection circle around the rough centre. Edge detection following the circle. b Mean of the edge points

For the precise determination of the crosshair, we consider the lines individually and points are determined at equal distances on each line. With the help of the direction angles, we form sectors of circles around the rough crosshairs. Within two defined circles with different radii, the edges per pixel of the lines are detected again according to Luhmann et al. (2020) (Fig. 5a). As a result, the edge beginnings and ends of the inner crosshair are available for each line as coordinates between the two circles. These are averaged again so that the centre points of the line contours are available. Points with the same direction on the inner crosshair are combined. This results in two lines: one horizontal and one vertical. These are equalised according to the same principle of Haralick and Shapiro (1992) and Suesse and Voss (1993) (Fig. 5c). The exact image coordinates of the crosshair centre are now available with sub-pixel accuracy via the point of intersection.

Fig. 5
figure 5

Precise detection of the inner crosshair. a Definition of circles around the rough centre. Creating sectors. b Mean of the edge points. c Compensated lines with end points

We calculate réseau crosses for the alignment of the precise crosshair. This is done by calculating two points per line. By defining two circles with different radii (small circle: 3 times the distance of the parallel crosshair lines, large circle: shortest distance from the centre to the edge of the image reduced by 10%). The resulting intersections with the outer cross lines are all within the image area.

The smaller circle is intersected with the precise inner crosshair lines (Fig. 6a). Similar to the procedure already described for the rough determination of the inner crosshair, the edge contours are determined perpendicular to the line from the crosshair centre according to Luhmann et al. (2020) (Fig. 6b). The detection of the contours is again carried out via grey value profiles and the start and end points are then averaged. We repeat the steps for the outer radius so that two points are available for each outer crosshair line. Finally, we sort the detected points separately for the inner and outer circle in the correct quadrant. The 12 points on the outer cross lines are determined precisely and are available in the correct position as a pair of points per line. One hundred further points are then determined between two points of a line according to the procedure shown in Fig. 5. We defined the number for which a large number of points is available for the definition of the straight line and the following adjustment. A different definition is also possible. The one hundred points to be detected are distributed at equal intervals along the length between the start and end point of a line. As in the previous step, the detection for each point of the line contour perpendicular to the line is performed. The points on the opposite crosshair lines are combined so that the final result is a horizontal and a vertical line (Fig. 7). The adjustment calculation is carried out according to the least squares method. We detect and eliminate outliers before the adjustment.

Fig. 6
figure 6

Point detection of the outer crosshair using the example of a double line. a Intersection of the inner circle with an inner line. b Line points are formed via edge detection perpendicular to the intersection point

Fig. 7
figure 7

Principle of adjustment (simplified example). a Before the adjustment. b After the adjustment

The calculation of the straight lines is based on the procedure according to Kampmann and Renner (2004), method 3. We transferred the model of the adjustment calculation mentioned to the model of the crosshair with two straight lines and adapted to its special features. The double lines of the crosshair result in an additional unknown so that the functional model in coordinate form is as follows:

$$a_{1,2} *x_{i} + b_{1,2} *y_{i} + \left( {d_{1,2} \pm \delta_{1,2} } \right) = 0.$$

The variable δ corresponds to the half parallel distance of a double line to the adjusted straight line. These are chosen with different signs for the double lines and should be defined the same for both straight lines. The parameter δ is omitted for the single line. The equation is set up independently for the two lines, so that the parameters must be determined separately for each equation. The eight parameters of the two lines to be determined are listed in the unknown vector \(X\).

The adjustment of both straight lines is done in one calculation. The approximate values for the variables d1/2 and δ1,2 are determined empirically. b1 and a1 are given the value zero and are thus defined as parallel straight lines to the image coordinate system. Since the underlying equation of the functional model is linear, the partial derivatives according to the parameters are equal to the observations used. Simplified, the condition equation can be regarded as an observation equation for the software solution, but with a significantly higher weight than the observations. The weight for all observations equals 1, and the conditional equation is given the weight 106 (Kampmann and Renner 2004). As previously described in the section on calibration, the solution of the system of normal equations is carried out by means of Cholesky factorisation according to Luhmann et al. (2020) or Förstner and Wrobel (2016). The adjustment is iterative until the termination criterion is reached. For the second iteration the parameter estimates are updated to the extent that the unknown \(X\) are defined as parameter estimates \({X}_{0}\) for the following iterations. For the determination of the two straight lines of the outer crosshair in the sub-pixel area with one decimal place, a few iterations are already sufficient. We choose the termination criterion in such a way that in normal cases only a few iterations are necessary. The three opposite lines with approximately the same orientation are balanced to form a straight line (Fig. 7). The outer line cross has thus been determined precisely so that the réseau crosses can then be determined as described. These are located on the straight lines determined at this point and, together with the crosshair centre, define the precise position and orientation of the crosshair in the image coordinate system.

4.2 Crosshair Matching

Due to the possibility that the crosshair position changes after a longer period of time and when the telescope position changes, this must be determined continuously (Atorf et al. 2019). In the case of smaller movements of the crosshair, this can be matched by calculating a normalised correlation coefficient (NCC). The current centre of the crosshair in the image is compared with the last detected crosshair. The current position of the crosshair centre must be within a generated model in order to be matched. If the difference in position is too large, the crosshairs must be detected again. Matching is again carried out using the NCC procedure according to Luhmann et al. (2020). This procedure corresponds to a simplified detection of the precise crosshair centre since no homogeneous background is required throughout. The current crosshair centre should be able to be continuously tracked in the image during a measurement.

For the model image, we define a circle with 0.75 times the distance between the parallel lines around the precise centre of the crosshair. Within this image area, the similarity comparison is carried out by means of normalised cross-correlation. The model image, in this case the pixels of the circle area, is generated on several image pyramids in different planes and rotations. We generate the image pyramids until the top level still provides enough information about the image. The processing effort is higher than with other correlation methods due to the large number of images generated. However, for the selected image area and due to today’s technology standards, this is not a disadvantage (Luhmann et al. 2020). Finally, the origin of the model image is set to the precise crosshair centre.

For the crosshair matching, the NCC model of the last detected crosshair centre together with the current crosshair is required. We define the search area for the inner crosshair by a circle around the last detected crosshair. We define the radius with 1.5 times of the distance between the parallel lines. This saves unnecessary computing time since the centre of the crosshair cannot be located at the edge of the image. Then, according to Luhmann et al. (2020), all defined instances of the crosshair are detected from within the image section of the current crosshair. We only use the best instance for the crosshair since it is unique in the image. The best instance is characterised as the highest value of the correlation coefficient, whereby this can take values between zero and one. The calculation of the precise crosshairs together with the resulting réseau crosses is then carried out according to the procedure already described. The software saves the coordinates again in a local file. However, the inner crosshair can be determined mathematically by detecting the outer crosshair (Fig. 8). By determining the crosshair lines and the geometric reference to the target axis in sub-pixel accuracy, the intersection of the crosshair lines does not have to be determined repeatedly in every image, but, for example, may also be overlapped. In the user interface of the control software, coordinate differences between the last detected and the matched crosshair centre are displayed to the user (see Fig. 9).

Fig. 8
figure 8

a, b Goal of crosshair matching (simplified example). Matched points on at least two lines in two directions (a) or on one line in combination with a successful matching of the inner crosshair ensure tracking of camera motions. c, d Examples of an unmatchable inner crosshair due to dark background (c) and overexposure (d). Nevertheless, the inner crosshair is calculable by the detection of the outer crosshair

Fig. 9
figure 9

A simplified overview of the single steps. In the following, we distinguish between the inner and the outer crosshair. The two inner lines mark the centre of the crosshair. The outer crosshair is composed of the six outer lines of the geodetic crosshair (Fig. 2)

5 Practical Applications

In the following, we cover exemplary studies on different applications of MoDiTa in the structural health monitoring (SHM) of existing structures.

5.1 Studies on Distance Independence

A practical application for IATS is the SHM of structures, such as factory chimneys, dams or bridges (Paar et al. 2021; Zschiesche 2022). What all these structures have in common is that they usually have elongated dimensions. Often it is impossible to stand directly perpendicular to the structure or to measure the entire structure from the same distance due to environmental conditions, such as rivers or railway lines (Fig. 10). Changes in the distance to the measured object lead to the refocusing of the optics and thus also to changes in the distortions. To discuss this aspect in more detail, we have carried out measurements from different distances to the instrument. For the measurement, we used a TS30 (Leica Geosystems AG 2009) and an industrial camera UI-3250 ML-M (IDS Imaging Development Systems GmbH 2015) with 1.92 MPixel.

Fig. 10
figure 10

Exemplary view of two IATS (MoDiTa) on an elongated structure with different forced distances. Here, the observation of a reference point (green dashed line) and simultaneous observation of other monitoring points on the bridge (yellow) are shown

The measurement took place on 3 February 2022 in the courtyard of Mainz University of Applied Sciences between 10 am and 1 pm (CET).

We limited the distance between 3.5 and 100 m. Over this distance calibration measurements were carried out with MoDiTa with a sample of 12 measurements for one distance. These measurements were taken over the entire image area. For comparison, we applied the calibration of the measurement with 20 m distance also to the measurements with shorter or longer distance. We compared the results in the form of residuals or deviations in Fig. 11. The different positions that IATS moves to are visible.

Fig. 11
figure 11

a A measurement image with displayed positions for self-calibration. The defined target is located at the positions marked in red. In this way, different positions (in this case 12) are approached across the image for adjustment. b The resulting residuals of a measurement at 20 m distance to the instrument. c The resulting deviations of a measurement at 70 m distance calculated with the self-calibration at 20 m distance to the instrument. d The resulting residuals of a measurement at 70 m distance to the instrument. e The resulting deviations of a measurement at 4.5 m distance calculated with the self-calibration at 20 m distance to the instrument. Clearly visible are the larger deviations compared to (b, c). f The resulting residuals of a measurement at 4.5 m distance

Figure 11b shows residuals of the 20 m measurement calculated with the 20 m calibration in the range of − 0.3 to 0.2 mgon for the horizontal and zenith angle, which is within the expected angular accuracy of the measurement system. We assume that the adjustment modulates the system successfully. In comparison, even at a longer distance to the object, deviations appear only slightly. However, the use of the same calibration shows that at a greater distance significantly larger deviations occur in the marginal area of the measurement image (Fig. 11c). Close to the crosshair, in this case also close to the centre of the image, the deviations have values of max. 0.3 mgon; in the outer range of − 0.6 to 0.9 mgon horizontally and − 0.1 to 0.5 mgon zenith distance. Therefore, the middle part is still modelled within the accuracy of measurement, but systematics can already be identified in the outer area. From a distance of approximately 7 m, the deviations increase significantly. For clarification, Fig. 11e shows the measurement with 4.5 m distance. The range of the residuals is from − 0.7 to 1.1 mgon for the horizontal and zenith angle. Here again, the values are smaller towards the centre, but on average well above 0.3 mgon. The systematic deviations can be traced back to an unsuitable model of the adjustment. Walser (2004) and Wasmeier (2009a) achieved similar results with an integrated coaxial camera and thus correspond to our expectations. We suspect the influence of a scale factor, possibly the optical system’s focal length. Figure 11d, f shows the residuals of the self-calibrations with the corresponding measurements. In both cases, the residuals remain within the measurement accuracy of the total station and do not exceed 0.3 mgon.

Figure 12 shows the estimated distortion parameters over the entire distance. The estimated nine parameters are presented with their respective accuracies as described in chapter 3. The calculation is made for the entire optics, i.e. for the telescope of the total station and the adapter used (Fig. 1). In the parameter estimation, we calculated a separate calibration for each distance and autofocus setting.

Fig. 12
figure 12

Plot of the estimated parameters and a posteriori/empirical standard deviation

In addition to the individual distances, we gave the respective stepper motor positions since these reflect the respective required mechanical movements of the focusing optics, which are responsible for the changes in the distortion parameters (Wasmeier 2009a).

For the template matching with the cross-correlation method, a circular section with a radius of less than 40 pixels was used in each case. The maximum mean error of unit of weight from the adjustment is 0.32 mgon. The smallest value is reached during measurement at a distance of 20 m, where it is 0.13 mgon. All measured values of the 12 measurements were used, and no measured values were eliminated during the adjustment.

Table 1 shows the maximum and minimum estimated parameter values and empirical standard deviations corresponding to Fig. 12a–h.

Table 1 Results of the adjustment

All symmetric radial distortions show their minima and maxima at close range (Fig. 12a–c). In total, the calculated empirical standard deviation of the tangential distortions B1 and B2 (Fig. 12d, e) show similar orders of magnitude and are more consistent than those of the symmetric radial distortions. Both B1 and B2 show their greatest value at a long distance, but fluctuate more in the near range.

The results show slightly strong changes of the parameters in the near range. This is in line with our expectations: with larger movements of the focusing optics, we anticipate correspondingly larger changes in the calibration parameters.

Overall, our studies show that micromovements of 0.5 mm/100 m can be resolved. The achievable accuracy depends on the atmospheric conditions and the illumination situation and decreases with increasing distance.

The investigations indicate that in the case of typical distances to structures for SHM of 10 m or greater, it is sufficient to use a single self-calibration for a complete project.

5.2 Deformation Measurement on a Steel Bridge

The identification of dynamic structural characteristics is an important aspect of SHM. It enables the analysis of monitored vibrations and the calculation of natural frequencies. Changes in natural frequency indicate possible structural damages. Conventionally, acceleration sensors are attached to the structure with a high labour input. MoDiTa enables high-frequency recordings of the movement behaviour without having to enter the structure. For the measurement of frequencies, the maximum frames per second (fps) are essential. Only at an adequately frequent sampling rate can the natural frequency be determined from the measured values and aliasing ruled out. Bruschetini-Ambro et al. (2017), Lachinger et al. (2022) do not recommend determining the damping from the excitation of a train crossing, because the selection of the ambient window has too great an influence on the result and therefore the results scatter too much.

To capture the deformation behaviour of a bridge during a crossing, we observed a steel bridge using MoDiTa (Fig. 13a, b). We used a Leica TS60 (Leica Geosystems AG 2020) in combination with an industrial camera UI3080 CP-M with 5.04 MPixel (IDS Imaging Development Systems GmbH 2016). The distance to the bridge was approximately 10 m. The recording frequency was 500 Hz with an exposure time of 0.002 s. To achieve this high frequency, we only observed an area of interest. The observed point is located in the upper centre of the western bridge. We attached no targets to the structure. Figure 13c shows the recorded deformation over time. For further analysis, we used the ambient window (shown in Fig. 13c by a green box). The ambient window captures the oscillation behaviour of the bridge without additional mass from the train.

Fig. 13
figure 13

a, b A steel bridge (30 m span) for freight and passenger traffic. c The recorded measured values in vertical direction. 25 s of the recording is shown. For this period, 12 500 observations are shown. For the evaluation of the vibration behaviour, we evaluated the ambient window. d The result of the fast Fourier transform (FFT) with a main peak at 3.9 Hz (approximate value). e A best-fit evaluation of the ambient window with a natural frequency at 3.8 Hz (balanced value)

Figure 13d shows the results of the Fast Fourier Transform (FFT) of the ambient window to determine the natural frequencies. Only the results of the low frequencies are shown. A clear peak is visible at 3.9 Hz.

To calculate the damped oscillation using the least squares fit, we evaluated the measured values from second 19.0 to 20.0 (Fig. 13e). The result is a calculated natural frequency of 3.79 Hz. The mean error of unit of weight is estimated at 0.046 mm/10 m. This corresponds to the expected resolution and accuracy, cf. 5.1.

Both evaluations come to almost the same result and confirm each other.

6 Conclusion and Outlook

In this article we explain how the general setup of the measuring system in combination with the developed software detects the crosshair and, thus, performs self-calibration. The self-calibration achieves accuracies within the measuring accuracy of the total station. We made assumptions such as the lines of the crosshairs are parallel and six precise lines represent the outer crosshair. We have not investigated either assumption further.

The algorithms explained track the crosshair position and orientation correctly and with sub-pixel accuracy, although the inner crosshair cannot be found by means of cross-correlation. Due to the determination of the crosshair lines and the sub-pixel accurate geometric reference to the target axis, the intersection point of the crosshair lines does not have to be determined again in every image but may also, for example, be covered. This offers the measuring system more flexibility.

In the explained practical application we show results of several calibration calculations. We have shown that it is not necessary to calculate a separate photogrammetric calibration for each distance to the object. For the combination of camera and total station used, areas can be defined which, for example, meet the requirements of the SHM and can be carried out with one calibration of the optical system. As the distance to the calibration distance increases, so do the angular deviations. Generally, the deviations towards the edge of the measurement image are larger. This shows that the distortion approach varies significantly, especially in a close-up range. We assume the influence of the changed camera constant c is responsible for the systematic deviations. The use of one calibration can save a lot of time during the measurement, as it is no longer necessary to measure the calibration pattern again and calculate it.

Furthermore, we carried out an exemplary deformation measurement on a steel bridge and demonstrated the successful use for the determination of natural frequencies. Due to the high recording frequency, it is possible to record the vibration behaviour. Further research is needed in this area, such as the comparison to other sensor systems.