1 Introduction

Non-destructive testing (NDT) methods, such as ultrasonic (UT) or impact echo testing play an important role for the remaining service life assessment of civil infrastructures. Failures, such as the Ponte Morandi bridge in Genoa (Italy), can have fatal consequences [1]. Thus, the service life assessment requires high reliability, which is only achievable with knowledge about the reliability of the applied inspection methods. However, reliability assessments of NDT methods are very rare in the field of civil engineering until today [2,3,4]. Typically, they are performed on mock-ups specifically designed for this purpose [5]. These mock-ups should contain artificially produced defects, whose sizes cover the entire transition zone between detectable and non-detectable. For example, POD analysis requires \(80\mathrm{\%}\) of the data points to have a detectability between \(10\mathrm{\%}\) and \(90\mathrm{\%}\) [6]. Thus, it is evident that a-priori information on defect detectability is required for the design of such mock-ups.

In addition to reliability assessments, information on defect detectability make such mock-ups invaluable for the training of inspectors [7]. To test the inspector’s expertise during training, it is important that it is possible to detect defects without it being trivial. Furthermore, capability studies that test the effectiveness of a NDT method for a given problem also require mock-ups that are specifically designed for this purpose [8].

This study presents an approach to estimate defect detectability for ultrasonic testing in concrete with numerical simulations. In order to do so, the simulations need to be validated beforehand to ensure the realism of the results. For the simulations to be realistic, they need to incorporate noise, which is the most limiting factor for defect detection. In concrete, most of the noise during ultrasonic testing originates from elastic wave scattering at the interior boundaries between aggregates, pores, and cement matrix [9]. Therefore, a numerical representation of the concrete is presented and validated first. Afterwards, it is tested whether the sources for UT are simulated correctly by investigating radiation characteristics of dry point contact transducers for homogeneous and concrete-like media. With this validation of the code, it is possible to conduct simulations of different defect scenarios. The results are then processed using the total focusing method [10] and evaluated visually just like real inspection data. This evaluation provides the base for the design of a mock-up with honeycombs, whose detections are non-trivial. In the end of this study, inspection results obtained from the designed mock-up are presented, evaluated, and compared to the results predicted by the simulations.

2 Elastodynamic Finite Integration Technique

The simulation code used for this study is the elastodynamic finite integration technique (EFIT) [11]. EFIT is a staggered-grid finite differences code for the time-explicit simulation of elastic wave propagation, which is very robust for strongly heterogeneous media [12] and has been used in the past for the simulation of UT [13] inspections in concrete. The simulation code was implemented in fortran according to the version found in [14] and parallelized using the message passing interface (MPI). This version of the code incorporates the Kelvin–Voigt model of viscoelasticity, so that additional wave damping can also be simulated. It was implemented in 2D and 3D to have a better adaptability to the simulation problem. The parallelized fortran implementation divides the simulation domain into multiple smaller domains, so that each process performs the simulation on a small domain. Boundary values are exchanged via MPI, which in combination with the small local simulation domains allow for fast computations especially using the high-performance computers of the Helmut Schmidt University.

In the code used here, the integration of Cauchy’s equation of motion and the deformation rate equation [11] use a time step that is directly connected to the largest wave propagation speed \({c}_{max}\) and the spatial resolution of the used grid \(\Delta x\). To ensure numerical stability, a time step \(\Delta t\) of

$$ \Delta t = \frac{1}{4} \frac{\Delta x}{{c_{\max } }} $$

is chosen throughout this study. With a typical P-wave speed in concrete of \({c}_{max}=4000\,{\text{m}}/{\text{s}}\) and a spatial resolution of \(\Delta x=1\,\text{mm}\), this would result in a time step of \(\Delta t= 62.5\,\text{ns}\). Therefore, several thousand iterations are necessary to simulate an UT measurement.

2.1 Numerical Concrete Model

To perform NDT simulations that explore defect detection limits, a suitable numerical model for concrete is mandatory to simulate noise. Here, the concrete model described by Schubert & Köhler (2004) [9] is used, in which aggregates and pores are represented as ellipsoids. The maximum aspect ratio for each ellipsoid is two. In the model, the aggregate size follows a given grading curve, while the pore size remains adaptable. Both, pores, and aggregates, are distributed randomly with a random spatial orientation inside the simulation domain. To fill the simulation domain with the ellipsoids the largest objects are placed first. The sizes then gradually decrease until a given share of the medium is occupied. Any aggregate smaller than the spatial resolution cannot be resolved and are excluded in the simulations. Therefore, the ratios in the grading histogram (see Fig. 1, left) remain unchanged. In the end, the cement matrix fills the rest of the medium. Material parameters are assigned to each grid cell separately, allowing each aggregate to have different mechanical properties. Pores are simulated as voids, so that no elastic waves can propagate inside of them.

Fig. 1
figure 1

Exemplary grading histogram for a B16 concrete (left) and numerical concrete model with honeycomb (right)

A control sequence to identify and avoid overlapping aggregates and pores must be implemented effectively as the generation of the numerical concrete can take a lot of time, especially in 3D [15]. In the implemented algorithm, the maximum half axis size is determined for each ellipsoid and the overlap check is then performed within a cube with twice the maximum half axis as the side lengths around the designated center of the aggregate/pore. EFIT only allows for rectangular grid cells. As all materials are treated as isotropic, cubic grid cells are used for all simulations. With such a mesh, all geometries inside the simulation domain are generated using a staircase approximation. Therefore, aggregates and pores are not perfectly ellipsoidal. An example of the numerical concrete model can be found in Fig. 1 (right).

Note that the numerical model does not include interfacial transition zones [16]. As this finite-differences-code operates on a cartesian grid, it is not computationally feasible to simulate such small areas as it would significantly compromise the resolution of the overall grid.

2.2 Validation of Simulation Results

Using this numerical concrete model, validation of the code is mandatory before designing the mock-up. As the mock-up shall be designed for UT, it must be ensured that the sources as well as the noise are simulated correctly. Therefore, a recreation of a real UT inspection from a reference specimen is performed to validate the used concrete model. Afterwards, radiation characteristics for ultrasound transducers are simulated and compared to analytical solutions.

2.2.1 Recreation of Ultrasound Data

The first validation experiment is the recreation of an UT inspection on a reference specimen with an embedded cladding pipe. This specimen was produced by the Fraunhofer Institute for Nondestructive Testing IZFP and has dimensions of \(80 \times 50 \times 38\,{{\text{cm}}}^{3}\) [17]. The pipe has a diameter of \(5\,{\text{cm}}\) and its center has the yz-coordinates (24.5 cm | 19.5 cm). For the inspection, two longitudinal ultrasound transducers (ACS S1803) with a center frequency of \(100\,{\text{kHz}}\) were used in a source-receiver configuration using a Morlet wavelet as the excitation signal. Both source and receiver were placed alongside a line circling the specimen at \(x = 40\,{\text{cm}}\). For a visualization of the specimen and the measurement positions see Fig. 2.

Fig. 2
figure 2

Measurement positions for the specimen used for validation of UT simulations (left) and distances to the source point (8) during the simulated experiment (right)

Simulations are performed in two ways: once using a homogeneous concrete model and once using the heterogeneous concrete model as described in Sect. 2.1. In both models the P-wave velocity is set at \({c}_{P} = 3700\,{\text{m}}/{\text{s}}\), while the shear wave velocity is \({c}_{S} = 2300\,{\text{m}}/{\text{s}}\). For the homogeneous model, a constant density of \(\rho = 2400\,{\text{kg}}/{{\text{m}}}^{3}\) is used. In the heterogenous model this density is used for the cement matrix only, whereas aggregates have varying densities between \(\rho_{agg} = 1400 {-} 3400\,{\text{kg}}/{{\text{m}}}^{3}\) to allow for scattering at the interface. For the volume and shear viscosities values of \({\eta }_{V}=600\,\text{Pa}\,\text{s}\) and \({\eta }_{S}=170\,\text{Pa}\,\text{s}\) are chosen, respectively. The porosity is \(2\,\mathrm{ vol}.{\%}\) and pore sizes vary equally between \(1.2\,{\text{mm}}\) and \(2.4\,{\text{mm}}\). In these simulations the grid cell size is fixed at \(0.6\,{\text{mm}}\) resulting in a total of 1340 × 840 × 640 (≈ 720 million) grid cells for the 3D simulation. Free boundaries are used to simulate the surfaces as well as the interfaces with the pipe and pores.

The results for the two different simulations can be compared to the measurement result in Fig. 3. Here, the source position is kept constant (Position 8, see Fig. 2), while the receiver moves around the specimen. From the comparison between the three results, it is evident, that both simulations can reproduce the correct onsets for P- and S-waves. After picking the onsets for P-, S-, and Rayleigh waves using the Hilbert transform, the average velocities for each wave type can be calculated for each dataset. This resulted in P-wave velocities of \(3600\,\text{m}/\text{s}\) in the measurement, \(3700\,\text{m}/\text{s}\) in the homogeneous simulation and \(3580\,\text{m}/\text{s}\) in the simulation using the concrete model. The homogenous simulation can be seen as a reference in this case as the determined velocity should agree with the material parameters. The reduction of this velocity is an effect caused by the scattering at interior interfaces within the concrete [18]. S-wave velocities show a similar trend for the simulations (homogeneous: \(2300\,\text{m}/\text{s}\), concrete model: \(2210\,\text{m}/\text{s}\)), but here the velocity obtained from the measurement (\(2380\,\text{m}/\text{s}\)) is slightly higher than the simulated ones. For the Rayleigh wave, the same behavior is observable (measurement: \(2090\,\text{m}/\text{s}\), hom. sim.: \(2060\,\text{m}/\text{s}\), concrete model: \(1960\,\text{m}/\text{s}\)). This alignment confirms the strong correlation between Rayleigh wave velocity and S-wave velocity. Since the measurement data contains a lot of additional noise likely caused by the measurement equipment, a direct comparison of phases between all three datasets is not conducted. Both simulations show that the Rayleigh wave would propagate around the entire specimen. Such a feature cannot be observed in the measurement data. Apart from that, the heterogeneous simulation is also able to reproduce the noise, which is present in the measurement data with realistic amplitudes. In Fig. 3 the amplitudes from the first P-wave arrival can be directly compared to one another. The overall distribution of amplitudes is relatively similar for all three scenarios. However, near the edges between the side walls and the surface, where the source is applied, the small P-wave amplitudes predicted by the simulations cannot be observed in the measurement data (receivers 15–20 and 50–54). Since the radiation characteristic for P-wave transducer shows amplitudes close to zero at these angles (\(\approx 90^\circ \)) [19], it is likely that the high amplitudes observed in the measurement data are caused by noise or surface waves. The identification of a distinct P-wave onset in the measurement data is challenging due to the radiation characteristic at these traces. Also, some amplitude variations occur in the measurement data due to different coupling of the sensors at each position. Both, the measurement and homogeneous simulation results show a local minimum in amplitude around trace 35. This feature is not observable in the heterogeneous simulation, where a local maximum can be observed at this position. Such a focusing effect could be a result from the random nature of the pores in the simulation, which will differ from the actual pore distribution in the specimen. A detailed investigation on the effect of heterogenous simulation domains on UT amplitudes is beyond the scope of this paper. Overall, it can be concluded that the EFIT code can recreate the actual measurement well in all facets except for the surface wave propagation, which differs strongly from the measurement data. A full validation for the correct reproduction of UT amplitudes cannot be achieved due to uncertainties in the measured and simulated amplitudes as explained above. Note that the damping parameters are chosen in such a way that they are fitted to the experimental data. For the ratio between volume and shear viscosity, the value from [9] is used. These damping parameters will later be used for the simulation of defect scenarios in Sect. 3.1.

Fig. 3
figure 3

Comparison of measured signal (top, left), simulated signal using the homogeneous concrete model (top, right) and simulation using a heterogeneous model (bottom, left). In the bottom right the comparison between the P-wave amplitudes is shown

2.2.2 Radiation Characteristics of Ultrasound Transducers

In the past, dry point contact transducers have proven to be very efficient tools for UT in civil engineering [20]. Theoretically, the radiation characteristics of such transducers should equal those of single force point sources [19]. However, in a validation experiment, large deviations from the analytical solutions were measured for S-waves, while the measured radiation characteristics for P-waves agreed with the theory [21]. To test if radiation characteristics can be recreated accurately in simulations, wave propagation is simulated in 2D and 3D at different distances from the source. Also, it is investigated how the heterogeneous mesostructure of concrete influences radiation characteristics through scattering. This approach ensures the accuracy of the point sources utilized for ultrasonic testing simulations later on.

Using 2D simulations in a homogeneous, fictive medium (\({c}_{P} = 4000\,\text{m}/\text{s}\), \({c}_{S} = 2000\,\text{m}/\text{s}\), \(\rho = 2400\,\text{kg}/\text{m}^{3}\)), the behavior of relative wave amplitudes at different distances from the source shall be studied. The grid cell size is \(\Delta x=3.125\,\text{mm}\) with 4000 × 2000 cells in total and only the top surface is simulated as a free boundary. No additional damping is used for these simulations as amplitudes shall not be altered. A Ricker wavelet with a center frequency of 40 kHz is used as the input wavelet. Absorbing boundaries can be found on all other sides, so that no wave modes can interfere with each other or alter the amplitudes. P- and S-wave amplitudes are measured at different azimuth angles relative to the source at different stages of the wave propagation [22]. Here, it is evident that P-wave amplitudes converge quickly towards the analytical solution, while S-waves do not reach a convergence at 43 wavelengths from the source (see Fig. 4). However, the results suggest that the radiation characteristic for S-waves could eventually converge at even greater distances. Note that the plots depicted in Fig. 4 vary from those found in [22]. This is due to a previous error in the calculation of the theoretical radiation characteristic. The obtained amplitudes from the simulations remain unchanged.

Fig. 4
figure 4

Radiation characteristics of P-waves (left) and S-waves (right) for a shear wave transducer at different distances from the source

The S-wave radiation characteristic obtained at a distance of 6.25 wavelengths away from the source also matches well with the experimentally obtained radiation characteristic [21] when neglecting the high amplitudes caused by superposition with Rayleigh waves at angles close to \(\pm 90^\circ \). This leads to the conclusion, that the radiation characteristic does indeed change over the traveled distance. From the obtained waveforms (see Fig. 5), it is observed that the complex radiation characteristics for S-waves are generated by interference between the S-wave and the head wave, which approaches the S-wavefront at the critical angle \(\varphi = {{\text{sin}}}^{-1}({c}_{s}/{c}_{P})\) [23].

Fig. 5
figure 5

S-wave fronts at 25 wavelengths away from the source for a longitudinal wave transducer (left) and shear wave transducer (right)

Using 3D simulations, it is possible to obtain radiation characteristics for homogeneous as well as for heterogeneous concrete-like media. Here, a comparison with analytical representations of radiation characteristics is not possible as the analytical solutions are only valid in 2D. From the EFIT simulations, it is found that the complex radiation characteristics for S-wave can also be found in 3D for homogeneous media. However, for heterogeneous media, these effects get widely eliminated due to wave scattering. Graphical representations of the 3D radiation characteristics can be found in [22].

Overall, EFIT simulations can recreate 2D radiation characteristics well. A convergence towards the analytical solution can not only be confirmed for P-waves due to a lack of computational power and precision. When simulating S-waves even with double precision at large distances, the wavefront tends to differ from a perfect half-circle, which is the reason why radiation characteristics at larger distances are not evaluated.

3 Design of Experiment for UT Capability Demonstration

As the simulation code is now validated for UT simulation in concrete, it can be used to design a mock-up with defect sizes and depths in the transition of detectability. According to Berens (2000) [6], at least \(80\%\) of the data points used for a POD analysis, which is the most common form of reliability analysis, require a detectability between \(10\%\) and \(90\%\). This condition just proves the necessity to being able of estimating defect detectability from simulations, which supports the design of complex defect scenarios.

To design the mock-up for this study, UT pre-simulations on a numerical model of the mock-up are carried out beforehand. The approximate defect sizes and depths, in which a detection is difficult, are then estimated from the simulation results. In this study the simulation data is processed using the total focusing method (TFM) [10], which is also used for measurement data to generate the images, that would be manually interpreted in a real inspection. In a feasibility study, it was shown that this design approach works for a specimen with honeycombs by estimating their impact solely from raw simulation data [24]. However, the specimen produced in the feasibility study is quite small (\(50 \times 50 \times 17\,{{\text{cm}}}^{3}\)) and only contained three honeycombs of the same size and in the same depth meaning that a full capability demonstration of UT could not be performed with these results. Therefore, a larger specimen is needed.

3.1 Design of the Reference Specimen

Similar to the mock-up produced in the feasibility study [24], this mock-up also contains honeycombs, all of which are spherical to reduce the degrees of freedom. However, as the number of honeycombs shall be larger, this necessarily implies a larger thickness. Here, a thickness of \(30\,{\text{cm}}\) is chosen. As the lateral dimensions of the mock-up need to be large enough to fit all defects, they were chosen to be \(1.5\,{\text{m}}\) in x- and y-direction. This way, also a good transportability of the mock-up within the lab can be guaranteed as the total weight is kept under \(2000\,{\text{kg}}\). Additionally, the aspect ratio is more favorable for other NDT methods like Impact Echo [25], which might also be applied on this mock-up in the future.

To estimate honeycomb detectability many test simulations with differently sized defects in different depths are carried out. In this scenario, a spatial resolution of \(\Delta x=1\,\text{mm}\) is selected and free boundaries are present on all sides of the specimen. The chosen wave speed values are \(4000\,\text{m/s}\) for P-waves and \(2650\,\text{m/s}\) for S-waves. The density of the cement matrix is set at 2000 kg/m3, while the density of the aggregates ranges between \(2000\,\mathrm{and }\,3000\,\text{kg}/\text{m}^{3}\). Damping parameters are defined as \({\eta }_{V}=530\,\text{Pa}\,\text{s}\) and \({\eta }_{S}=136\,\text{Pa}\,\text{s}\), and the porosity of the concrete is 2 vol.%.

The simulations should represent a possible testing scenario on the real specimen as close as possible. Therefore, the simulation setup is oriented towards the testing equipment, which shall be used to carry out the inspections. Here, this equipment consists of a Proceq Pundit PD8000, which uses a maximum center frequency of \(65\,{\text{kHz}}\). This frequency should therefore have the highest chance to detect honeycombs and is used for the simulation for that reason. The Pundit PD8000 uses a square wave as the excitation pulse. However, square pulses are not band limited and would therefore cause numerical dispersion or even instabilities for the simulation, which is why a Ricker wavelet with the abovementioned center frequency is chosen instead. The Pundit operates 24 transducers in eight separate groups of three. During every measurement each group acts as a source once, while the other seven are recording, which means that eight simulations are required to obtain one TFM result. SH waves are used for the measurements. Schematic visualizations of the simulation setup are shown in Fig. 6.

Fig. 6
figure 6

Schematic visualization of the simulation setup. Left: 3D visualization of the simulated problem. For reasons of clarity, only the rebar located directly above the honeycomb is shown. Right: 2D top-view of the relative source and receiver positions with respect to the rebar and honeycomb. During each “shot” in a real measurement, each transducer group will be source once

For post-processing of the simulation results, a simple version of the total focusing method (TFM) is used, in which the signals are corrected for their delay in travel time and the Hilbert transforms of these are summed up without any weighted addends [10]. The direct SH-wave is muted manually in the signals to avoid artifacts caused by its high-amplitudes.

Honeycombs are assumed to be regions with a low-density cement matrix (\({\rho }_{cem} = 400\,{\text{kg}}/{{\text{m}}}^{3}\)) [18, 26], which shall resemble the large pore clusters found in a real honeycomb (see Fig. 1, right). To simulate different noise levels and testing conditions, some simulations have a rebar mesh (\(12\,\text{mm}\) diameter, \(20\,\text{cm}\) mesh size) placed inside the medium. One of the rebars is placed directly above the honeycomb with a concrete cover of \(44\,\text{mm}\) (see Fig. 6). This is viewed as a more complex testing scenario, which is used to estimate the upper detection limits, whereas a concrete specimen without any steel reinforcement is viewed as an easier testing scenario and is therefore used to estimate the lower detectability limits.

For the decision, which honeycomb sizes shall be placed in which depths, three different depth regions are considered separately: \(5-10\,{\text{cm}}\), \(10-15\,{\text{cm}}\), and \(15-20\,\mathrm{ cm}\). Numerical simulations are carried out using different honeycomb sizes at depths of \(8\,\mathrm{ cm}\), \(12\,\mathrm{ cm}\), and \(17\,\mathrm{ cm}\), which correspond to the centers of each region. The decision process is exemplary visualized for a depth of \(12\,{\text{cm}}\) in Fig. 7. To estimate the lower detection limit, the less noisy setup without steel reinforcement is used. During a real inspection, honeycombs can either be detected by a decreased backwall echo amplitude, by a shifted backwall echo, or by a direct reflection from the honeycomb itself [8]. In the comparison between a simulation without defect and a honeycomb size of \(6\,{\text{cm}}\), it can be observed that the honeycomb only causes a slight reduction in backwall echo amplitude, while no direct reflection is visible. As the backwall echo is clearly reduced, a detection of this honeycomb under favorable conditions, in which noise amplitudes are low so that the backwall echo has stable amplitudes in the undamaged regions, could be possible. For the upper detection limit, the setup with a rebar directly above the honeycomb is used. For a honeycomb with a diameter of \(12\,{\text{cm}}\) a clear direct reflection is identifiable. However, the amplitude of this reflection is similar to the reflection of the rebar. The backwall echo again shows clearly reduced amplitudes, but the echo is still continuous, which should result in a small chance that this defect might be overlooked.

Fig. 7
figure 7

TFM results of simulation data. Three simulations were conducted without steel reinforcement (SR) with different honeycomb sizes (6 cm and 12 cm diameter). For reference a simulation without defect is included. To estimate the upper detection limit, a simulation setup with steel reinforcement (44 mm concrete cover) is used

The abovementioned decision process is also applied to the other depth regions. There it is found that honeycombs with diameters of \(4-8\,{\text{cm}}\) should be placed in a depth between \(5{-}10 \,{\text{cm,}}\) and honeycombs with diameters between \(6\,{\text{cm}}\) and \(9\,{\text{cm}}\) in a depth of \(15-20\,{\text{cm}}\). With this knowledge a total of 36 honeycombs are produced using pervious concrete. These honeycombs have diameters between \(4\,{\text{cm}}\) and \(12\,{\text{cm}}\) with an increment of \(1\,{\text{cm}}\), so that four honeycombs of each size exist.

For the steel reinforcement, rebars with a diameter of \(12\,{\text{mm}}\) and an irregular mesh with sizes varying from \(120\,{\text{mm}}\) to \(300\,{\text{mm}}\) is used. This way, different reinforcement levels can be incorporated into the measurement by placing honeycombs at different lateral positions in the mock-up. Additionally, the steel reinforcement mesh at the bottom mirrored the one at the top, so that the reinforcement level would be different for each honeycomb when the measurement was performed at the front or back side. A 3D model of the mock-up without honeycombs as well as a honeycomb map can be found in Fig. 8.

Fig. 8
figure 8

3D-Model of the mock-up without honeycombs (left) as well as a map with all honeycomb locations and positions at which UT measurements were conducted (right)

3.2 Specimen Production

The production of the specimen started with the production of the honeycombs, as artificially produced honeycombs are typically produced separately from the rest of the concrete [27]. For the honeycombs, pervious concrete is poured into plastic molds to obtain the desired dimensions. The pervious concrete is produced by only using large aggregates (8–16 mm) and just enough cement paste to bind the aggregates together. After hardening, the honeycombs are removed from the plastic molds and wrapped into plastic foil to ensure that the large pores in between aggregates are not being filled during concreting. Threads are attached to the foil and knotted to the nearest rebars. The rebars are placed into the mold for the specimen by drilling holes at the side of the mold, that are as large as the rebar diameter. This way it can be ensured that both the honeycombs and the rebars are placed in the correct position and do not move during concreting. For the specimen, a self-compacting concrete is used to avoid any voids or other unwanted defects. The maximum aggregate size is 16 mm as it was also used in the simulation. Table 1 shows the concrete composition. Measurements are conducted 28 days after concreting. The final dimensions of the produced specimen were \(150 \times 150 \times 32\,\text{cm}^{3}\).

Table 1 Concrete mix design

3.3 Measurement Results

On this specimen a total of 52 Ultrasonic line scans are carried out every \(10\,{\text{cm}}\) while always leaving \(15\,{\text{cm}}\) to each edge to avoid unwanted reflections from the sides (see black lines in Fig. 8, right). All signals are recorded using the Proceq Pundit PD8000. Using AI positioning technology, the results from multiple single shots along each line can be processed into one large TFM image. Overlap in between measurements ranged from 1 to 5 cm. The center frequency is \(65\,{\text{kHz}}\) in all measurements, which is also the highest possible frequency that the device can generate. This allows to assess the maximum capability of the used device. At the start of the inspection, the shear wave velocity is measured to obtain the correct velocity, which is then used for the TFM. In the end all line scan TFM results are exported as images, which are then analyzed manually to detect the honeycombs in a hit/miss fashion. For honeycomb detection, either the back wall echo amplitude must be significantly reduced, or a direct honeycomb reflection must be identifiable. Shifted backwall echoes are not found in the data. Note that measurements are carried out not only directly above the honeycombs, but also along lines, where no defect is implemented. These results serve as a reference for visual honeycomb detection of the processed line scan images.

Overall, each of the 36 honeycombs is measured 4 times (measurements in x- and y-direction on both sides of the mock-up). In total about \(68\%\) (99/144) of all honeycombs can be detected this way by either showing a clear direct reflection or a weakened backwall echo. A weakened backwall echo is observed for 83 out of the 144 honeycombs measured (\(58\%\)), while a direct reflection can only be observed for 64 honeycombs (\(44\%\)). Examples for detected and undetected honeycombs can be seen in the line scan obtained at \(x=1.15\,\text{m}\) (Fig. 9). Here, also the complexity of the signals becomes visible. Of the four honeycombs present in this image, only three exhibit a weakened backwall echo, while only two show a clear direct reflection signal. Therefore, it is likely that at least one honeycomb (especially the one at \(y= 1.35\,{\text{m}}\)) would be overlooked in a real inspection. In general, the back wall echo as well as the top rebar layer are clearly identifiable in all measurements. However, the amplitude of the backwall echo is highly variable even in the undamaged areas of the specimen. Such an effect can also be observed in Fig. 9 at a position of about \(1.1\,{\text{m}}\). This variability could be mistaken as a defect indication. Additionally, noise is present in all sections of the data. In some cases, this noise also appears as an additional reflection (false alarm), which might be caused by a strong local heterogeneity of the concrete. An example for such a signal can be seen in at a lateral position of about \(0.25\,{\text{m}}\) shortly before the backwall echo (marked red in Fig. 9).

Fig. 9
figure 9

Exemplary data interpretation for a UT B-Scan. Indications from steel reinforcements are marked in orange. Indications for honeycombs (reflections or missing backwall echo) are marked in yellow. The measurement was conducted in y-direction at x = 1.15 m (marked in red in Fig. 8, right) (Color figure online)

4 Discussion

The obtained measurement results suggest that the design method was successful in finding defect scenarios that are difficult to detect. However, a direct comparison between a simulated defect scenario and a measured one is yet to be evaluated. The purpose of the simulated scenarios was to identify approximate size and depth ranges for honeycombs. It is important to note that recreating a simulated scenario with the exact same geometry in the actual specimen is not the intention. Figure 10 provides an example of a measured scenario closely resembling a simulated one. Here, the honeycomb has a diameter of \(8\,{\text{cm}}\) at a depth of \(13\,{\text{cm}}\). The only difference between the simulated and experimental scenario is that the honeycomb is shifted slightly to the side of the rebar in the measurement. Overall, there are many similar features in the two TFM results: the rebar echo and the backwall echo are clearly visible. A direct reflection of the honeycombs is visible on both occasions, but their amplitude is much smaller than the rebar echo. In the simulation, the honeycomb reflection has a relative amplitude of \(64\%\) compared to the rebar echo. For the measurement this value is approx. \(32\%\), meaning that the honeycomb echo is either overestimated in the simulation or the measurement was not conducted perfectly above the honeycomb, which is likely the case as honeycomb positions may have slightly shifted during concreting. The rebar echo is much smaller in the simulation than in the measurement, which might be a sign for local, unintended honeycombing near the rebar. As the backwall echo is also much broader in the measurement, damping parameters are likely stronger in the measurement than in the simulation. In the measurement results, the rebar echo is not clearly separated from the honeycomb echo, whereas there is a gap of about \(6\,{\text{cm}}\) in the simulation result. Given that the main honeycomb echo aligns with the correct depth in the measurement result, an uplift of the honeycomb during concreting is unlikely.

Fig. 10
figure 10

Comparison between measured (left) and simulated (right) defect scenario. Rebar positions are highlighted in orange, while positions of honeycombs are marked yellow (Color figure online)

In both cases, the backwall echo is weakened to some degree in the areas affected by the honeycomb. Backwall echo reduction is about \(64\%\) in the simulation and \(43\%\) in the measurement. Reasons for this overestimation could be attributed to the same factors as those influencing the overestimated honeycomb reflection amplitude.

The fact, that all defect properties are simulated correctly, also indicates that the chosen numerical model for the honeycombs behaves realistically. For future studies, the numerical model ideally should be validated beforehand. Here, the feasibility study [24], in which a smaller specimen was produced using the same approach and the same numerical model, was already seen as a validation of the model, as it was successful. However, the comparison between measurement and simulation results confirms this as well.

Apart from the previously discussed differences, the simulation captures the main features of the defect scenario well and produces a realistic noise level. Therefore, the detectability of real defects should generally be estimated well with the simulations. The importance of simulating noise for this purpose can be demonstrated by comparing the same defect scenario using a homogeneous and heterogeneous concrete model (see Fig. 11). Here, it is evident that even the smallest direct reflection will be identifiable if a homogeneous concrete model is used. Therefore, an estimation of defect detectability would be impossible.

Fig. 11
figure 11

Comparison of simulations of a honeycomb with 4 cm diameter in a depth of 8 cm without rebars for a homogeneous (left) and heterogeneous (right) concrete model

Another aspect that affects defect detection, that has not been considered in this study, is the choice of migration velocity. To transform the time-signals into depths, the correct velocity must be chosen for the TFM method. If the chosen velocity is incorrect, then the obtained signals will not focus as well, and defects might get overlooked. Choosing a suitable velocity for TFM also means that the overall wave velocity inside the reconstructed volume needs to be homogeneous or at least should not contain any gradients. However, due to heterogeneous hydration conditions and compaction effects, this might not always be the case for concrete. Other processing methods like the full-waveform inversion [28] allow for heterogeneous models, but they are usually too computationally expensive to be applied for a wide range of data.

5 Conclusions and Outlook

This study demonstrates a possible approach for the design of mock-ups for capability assessments of UT in civil engineering using numerical simulations. The EFIT simulations can qualitatively predict UT measurement results well. This also includes noise and radiation characteristics for point sources needed to simulate ultrasound dry point contact transducers. After the validation, the code was used for defect detectability estimation to find scenarios, in which honeycombs of different sizes and depths are not trivially detectable. The simulation results are then directly used to design a mock-up. Overall, about \(68\mathrm{\%}\) of the defects inside this specimen could be detected using UT, which implies that the design approach was successful and is recommendable for future studies as well. In the future, also other NDT methods can be tested on this mock-up to allow for a performance comparison in different testing scenarios.

This investigation paves the way for future reliability analysis of UT measurements in reinforced concrete, which shall be carried out in the near future.