1 Introduction

Infrasound signals in the atmosphere from natural and anthropogenic events propagate in a very dynamic and anisotropic medium, in contrast to seismic waves in the solid earth where the propagation medium can be considered static. With the dawn of the Comprehensive Nuclear-Test-Ban Treaty (CTBT), incepted in 1996 after decade-long disarmament negotiations for the monitoring of all environments towards banning nuclear explosions, the infrasound technology has experienced a rebirth to early research activities in the 1950 s and 1960 s. At that time, atmospheric testing was extensively carried out, but was banned by the Limited Test Ban Treaty (LTBT) in 1963, prohibiting nuclear testing in outer space, under water and in the atmosphere (Dahlman et al. 2010).

Numerous studies in recent years (e.g. Fuchs et al. 2019; Green et al. 2011; Koch 2010; Koch and Pilger 2019, 2020; Le Pichon et al. 2008) have found examples that the atmospheric specifications available from analysis data of numerical weather prediction models may not necessarily describe infrasound observations at regional distances adequately enough. In particular, the atmospheric models have lacked the corresponding duct that was most likely implied to exist based on the wave field parameters extracted from the recorded signals. As described by Le Pichon et al. (2008) or Koch (2010) infrasound detection is governed by the seasonal reversal of the stratospheric wind pattern around the spring and fall equinoxes, with the stratospheric wind pattern pointing towards the east during winter months and reversing during the summer. This consistent pattern was demonstrated by Koch (2010) using infrasound observations from rocket engine tests. These were development tests carried out by the German Aerospace Center (in German: Deutsches Zentrum für Luft- und Raumfahrt, DLR) facility near Heilbronn, Germany, for the Ariane 5 main booster engine in the years 2000–2004. The infrasound array IS26 of the CTBT International Monitoring System (IMS), located at a distance of 320 km to the east-southeast from the source site, recorded a larger number of related infrasound signals in this period. However, infrasound signals from these tests were only detected in the months from November to April, whereas no detections were found during the rest of the year. In contrast to the present study, Koch (2010) did not take into account stratospheric conditions or infrasound propagation modeling, therefore neither any detections around the equinoxes nor the frequent non-detections in winter were discussed or quantified with respect to the expected detection performance. The winter observations of Ariane 5 engine tests were generally confirmed, also by utilizing propagation modeling, during a field campaign from late 2011 to May 2012 (Pilger et al. 2013) for nine further engine tests. However, during that campaign an infrasound observation was made in May 2012 although ray tracing incorporating the European Centre for Medium-Range Weather Forecasts (ECMWF) high-resolution (HRES) operational atmospheric model analysis did not provide ducting conditions for an arrival at IS26, except when introducing model variations from gravity wave perturbations by Gardner et al. (1993).

In fall 2018 (3 September) an explosion near Ingolstadt, Southern Germany, caused an impressive number of infrasound observations within a few hundred kilometers showing evidence that common atmospheric model products may fail in specific cases to model infrasound arrivals (Fuchs et al. 2019; Koch and Pilger, 2020). In these cases the modeling provided strong indication of sole (or multipath) thermospheric returns at distances between 200 and 600 km distances, while results from waveform analyses strongly indicated that the observed arrivals were indeed from stratospheric ducting based on obtained celerity and trace velocity estimates.

Other studies (Blixt et al. 2019; Hupe et al. 2019; Le Pichon et al. 2015; Smets et al. 2015) have been conducted in the framework of the Atmospheric Dynamics Research Infrastructure in Europe (ARISE) project (Blanc et al. 2018) and were aimed at assessing the accuracy of middle atmosphere numerical weather prediction models, e.g., from ECMWF. In these studies model data from ECMWF were compared to temperature and wind speed components derived from satellite and ground based observation systems or validated using propagation between ground-truth sources and infrasound stations. Smets et al. (2015) used large mining explosions as ground-truth cases for the simulation of infrasound propagation with ensembles of perturbed ECMWF models, where neither the basic model cases nor the perturbed analyses can fully explain all station observations. Le Pichon et al. (2015) found from lidar and wind radiometer instruments at the Haute Provence observatory (OHP, France) that these measurements are broadly consistent with the ECMWF model for altitudes up to ~ 40 km, but differences become significant for larger heights, with differences exceeding 5 K for temperature and 20 m/s for zonal wind. Hupe et al. (2019) compared the temperature field of the ECMWF HRES atmospheric model analysis with the temperature field of mobile lidar measurements, which indicated a cold bias above 40 km, with a maximum of about 12 K near 60 km altitude. They also evaluated the impact on the detection pattern for microbaroms after inclusion of uncertainties in temperature and horizontal winds and found a significant improvement compared to the direct output of the ECMWF model. Blixt et al. (2019) used ground-truth explosions from scheduled ammunition destruction to estimate the modeling performance of ECMWF reanalysis products in terms of estimating cross-winds and comparing them to observed back-azimuth deviations at an infrasound array in 178 km distance. They found good agreement between detections and modeling, although the receiver is situated in a classical shadow-zone distance, involving the need of partial reflections for a suitable explanation of the detections. The setting and method were described in follow-up studies (Amezcua and Barton, 2021; Amezcua et al. 2020) with the aim of assimilating the infrasound data from ground-truth sources to constrain horizontal winds and implement these into atmospheric models. These studies are in line with our present study, aiming at the description and improvement of atmospheric model products and ultimately of operational numerical weather forecasts by utilizing ground-truth infrasound data, here rocket engine tests.

In a recent study by Pilger et al. (2021), the infrasonic signature of 1001 rocket launches, including 66 Ariane 5 rocket launches, was investigated. While long-range infrasound detectability of this rocket type was clearly proven there, the present study aims primarily at regional infrasound propagation and at a different Ariane 5 source configuration. Firstly, we study here infrasound emitted from the main engine, while in the referenced study additional booster stages are involved. Secondly, we deal with a spatially fixed ground-truth source, while Pilger et al. (2021) consider a laterally and vertically moving infrasound source along its flight path.

Here, we take the opportunity of a large ground-truth dataset of 159 rocket engine tests over the past two decades to investigate the adequacy of the ECMWF analysis model in correctly reproducing arrivals or non-arrivals at station IS26 in the Bavarian Forest (see also Fig. 1). In particular, we focus on those cases where we lack an observation when an arrival is modeled, and on other cases when arrivals exist but the model fails to reproduce them. The distance range of 320 km is of particular interest since it normally partitions the range of tropospheric wave propagation from the range where atmospheric infrasound propagation is dominated by stratospheric ducting. Both ranges are separated by an acoustic shadow zone (Gutenberg 1939). This study further investigates how prominent such a shadow zone impacts non-detections.

Fig. 1
figure 1

Geographical map showing the location of the propulsion test facility P5 of DLR (star) near Heilbronn and the IMS infrasound station IS26 (triangle) in the Bavarian forest. Distance circles (white) in increments of 100 km are also displayed, with IS26 at a range of 320 km and with a backazimuth of 280° from the P5. Additionally, the locations of major cities are given

2 The Ground-Truth Source and Receiver Locations

The ground-truth source used for this study is the one already described by Koch (2010), which was also the subject of Pilger et al.’s (2013) study. In the 1990 s, the Space Propulsion Institute of DLR at Lampoldshausen near Heilbronn, Southern Germany, established a testing facility (called P5) within the development programme for the Ariane 5 main engine. This facility is located at latitude 49.29°N and longitude 9.38°E (Fig. 1). In the early years, up to 2004, the purpose of the facility was testing of the design concepts for the VULCAIN I and II engines (Koch 2010). The testing afterwards shifted to acceptance testing of the engines before use in the space flight programme of the European Space Agency (ESA), which has continued to the present (e.g. see https://www.dlr.de/ra/en/, last access 07 April 2022). For each test the main engine of Ariane 5 with a thrust of 1000 kN is mounted in the P5 facility and, hence, provides fairly identical setups for each test, including the deflection of the propulsion jet towards the southeast, approximately in the direction to the infrasound array station IS26 recording the atmospheric pressure waves. Therefore we assume that the infrasound generation mechanism is nearly identical from test to test and that regional infrasound observations are, for the most part, governed by the atmospheric parameters at the times of the tests.

As shown in (Fig. 1), IMS station IS26 is located at 48.85°N and 13.71°E in an azimuth direction of 97° and a distance of 320 km from Lampoldshausen. With known test duration and the theoretical backazimuth of 280°, two parameters are available to reliably identify the infrasound signal in the observations: (1) the signal duration in the waveforms and (2) the backazimuth from array processing techniques. From 2000 to 2019, we received the ground-truth parameters (date and time, test duration) for 172 engine tests by DLR. Thirteen of these had durations of less than 10 s, while 93 tests lasted more than 600 s, being considered the minimum thrusting time of this stage for a successful rocket launch. From all tests, 96 were carried out during the initial five years starting in 2000 (Koch 2010), whereas for another set of 76 tests over the subsequent 15 years the ground-truth information was collected recently (K. Fröhlke, pers. communication). Excluding the 13 (probably failed) tests with a duration below 10 s leaves a total of 159 events as ground-truth dataset for this study.

3 Data Analysis

With the distance of 320 km between IS26 and Lampoldshausen and celerities of tropospheric acoustic waves of 320–350 m/s (Negraru et al. 2010), such waves would arrive between 15 and 16 min after the propulsion test’s origin time, while stratospheric waves with celerities of 280–320 m/s would arrive another 2 min later. Within these delay time windows an interactive search for corresponding signal onsets was carried out. When a signal was identified, mainly based on a multi-channel increase of root-mean-square (RMS) amplitude for a time length approximately corresponding to the propulsion duration, a frequency-wavenumber (F-K) analysis (Stammler 1993) was carried out to find the backazimuth and the apparent velocity (derived from F-K slowness values) parallel to the approach described in Koch (2010). In most cases a signal was identified, reflecting the theoretical backazimuth within a few degrees.

In (Fig. 2) waveforms are shown for a couple of events that were detected at IS26 with different signal to noise ratios (SNR). The data are high-pass filtered at 2 Hz. The frequency range considered in this study is therefore much higher than the frequency band commonly considered in infrasound studies based on IMS stations, ranging between several tenths to a few Hz (Campus 2004; Campus and Christie 2010). The waveforms of the upper two events, cases no. 1 and 2, where signals above 3 Hz clearly stand out from the background noise, represent data with a rather decent SNR of about 3 to 5. The third event shows more emergent, and hence less prominent, waveforms with a SNR of between 1 and 2. The lower waveforms are from two cases that do not exhibit any engine test signals, but only noise bursts or signals of no interest. Case no. 4 is from a propulsion test in winter (December 2000), where atmospheric ducting conditions between the DLR facility and IS26 are usually favorable; while for case no. 5 of a test in summer (July 2001) the propagation conditions normally do not allow the observation of a signal.

Fig. 2
figure 2

Waveform recordings of 2 Hz high-pass filtered infrasound signals at IS26 for five Ariane 5 engine test events (traces normalized per event and labeled with a running number, the channel information and an “F” as code letter for infrasound). Shown are good signals for the top two events, fair to poor signals for the middle event, and no signals for the two bottom events; from top to bottom: (1) 23-Nov-2000, (2) 14-Feb-2013, (3) 16-Feb-2012, (4) 7-Dec-2000, (5) 3-Jul-2001. All engine tests presented here had durations of more than 600 s, as can be deduced from the traces for the top three events

As previously found for Central Europe, pressure sources west of infrasound stations lead to frequent infrasound detections from fall to spring equinoxes, while signal detections are prominent from the opposite direction during summer months (e.g., Gibbons et al. 2015; Green et al. 2011; Koch 2010; Le Pichon et al. 2008; Pilger et al. 2018). (Table 1) summarizes the detection statistics for all 159 Ariane 5 engine tests from 2000 to 2019 as observed at IS26, confirming this pattern. In this compilation all tests of less than 10 s duration were left out, as these may not have been easily identified due to the short duration or an inability to obtain stable F-K analysis results. Except for a single observation each in May and September, all other 69 engine tests that were detected at IS26 occurred from October to April. In this winter season, only one quarter of the tests resulted in non-detections at IS26. Non-detections in the summer months May to September are pervasive; i.e., for 63 (97%) out of 65 engine tests in the summer season, it was not possible to find an associated signal.

Table 1 Monthly statistics of the numbers of Ariane 5 engine tests detected or non-detected in the years 2000–2019, also giving the portion in each category

For each identified infrasound signal a frequency-wavenumber analysis was carried out. The raw waveform data were high-pass filtered at 2 or 3 Hz, depending on the optimal signal, and the analysis was carried out for frequencies up to 6 Hz, taking into account a previous estimation of the high-frequency signal content of rocket engine tests and optimal processing parameters for the same kind of source (Koch 2010). A maximum slowness range of 450 s/deg was applied with a discretization of 120 points, as required by the analysis program (Stammler 1993), covering trace velocities down to 250 m/s. The associated best estimates per event gathered for backazimuth and derived apparent velocity are displayed in (Fig. 3). The observed backazimuth from IS26 to Lampoldshausen is in good agreement with the true value of 280°, scattering about ± 5°. For the slowness we mostly obtained values between 300 and 340 s/deg, translating into trace velocities between 330 and 370 m/s. These values indicate stratospheric ducting, as they appear to exceed the near surface sound velocities of 330–340 m/s. In (Fig. 3), a stronger scatter in the estimated array processing results seems to occur prior to 2008. Its reduction afterwards may be related to the upgrade from a five-element to an eight-element infrasound array, providing more stable array-processing results. Exemplary calculations of the theoretical uncertainties of trace velocity and backazimuth (e.g., Szuberla and Olson, 2004) between the five-element and eight-element configurations of IS26 show a reduced scatter in both parameters (1.1° instead of 1.8° and 4.2 m/s instead of 5.8 m/s). These uncertainty estimates qualitatively explain the variation in (Fig. 3).

Fig. 3
figure 3

Results of the array data processing for the infrasound signals at IS26. The backazimuth scatters around the theoretical value of 280° by about ± 5°. The slowness estimate from F-K analysis (i.e., output of the applied software) is converted to apparent (or trace) velocity, scattering between 310 and 375 m/s. Note the larger scattering range before 2008, which may be related to the smaller number of five array elements compared to the present-day eight elements

In order to assess the fraction of non-detections that may be due to increased background noise levels at the IS26 array we present signal vs. noise levels for detections and only noise levels for the cases of non-detections from element I26H1, as shown in (Fig. 4). For identified signals the estimation was carried out in windows being representative of the best and most stable signal level to include, if possible, the entire signal, but exclude, if any, spikes or noise bursts. Noise signal estimates were selected immediately preceding the signal window. In the case of non-detections the noise window was selected within the expected arrival time window. The RMS amplitude served as signal and noise level measure, considered a stable amplitude estimate in contrast to a peak-to-peak-amplitude in the presence of incidental noise, such as spikes or bursts. It is noteworthy that noise levels prior to 2005 are higher than for later years by about 50%. This effect should be associated with the wind noise reduction filter system installed at IS26, which was upgraded in early June 2004 with capillaries in the pipe arrays, modifying the acoustic impedance to reduce resonant noise effects above 5 Hz. Koch (2010) assessed signal levels at all elements of IS26 for the years 2000–2004, with a subtle decrease observed for I26H4 in 2003. This element was temporarily equipped with such impedance reducers during that period to test the suppression of spurious spectral peaks caused by the spatial noise reduction system used at IS26.

Fig. 4
figure 4

RMS amplitude measurements for pre-signal noise levels (blue diamonds) in case of detections, and noise levels during the expected signal duration in case of non-detections (red crosses). Note the reduced scatter after early June 2004 (vertical line) when capillaries were installed at the infrasound station’s wind noise reduction system (see text for further details). Horizontal dashed lines denote the maximum RMS amplitude noise level for detections before and after June 2004

While noise levels of detections were below 0.017 Pa prior to the configuration change, the noise levels thereafter were below 0.008 Pa. From this result, we conclude that engine tests with noise levels above this baseline level may not be detected due to an insufficient signal-to-noise ratio. Of course, the opposite conjecture may also hold: cases with pre-event signal levels (i.e., noise) below the baseline should be detected. For the noise levels of all non-detection cases, as shown in (Fig. 4), a majority of more than 80% of the cases is below the given amplitude levels. This indicates that variations near the source and increased ambient noise levels are not a predominant cause for the lack of detections.

4 Atmospheric and Propagation Modeling

Atmospheric conditions are significantly different between spring and fall equinoxes, in particular with respect to the reversal of the dominant stratospheric wind direction (Le Pichon et al. 2015). The particular atmospheric model considered here is ECMWF’s HRES operational atmospheric model analysis produced by the Integrated Forecast System (IFS), which is specified for altitudes up to around 70 km at 6 h temporal resolution. The ECMWF model’s analysis data of the physical parameters pressure, temperature and wind speeds are used on a 0.5° × 0.5° horizontal grid at 60 (cycles 21r4 to 29r2, until 2006), 91 (cycles 30r1 to 38r1, until 2013) or 137 model levels (cycles 38r2 to 46r1). For the 2D finite-difference ray tracing modeling (adapted from a seismological ray tracer; Margrave and Lamoureux, 2019) we apply a 2D atmospheric background model of effective sound speed profiles from spatially and temporally interpolated ECMWF profiles of temperature and wind speed in propagation direction along the source-receiver path.

As a reference model for each test case, a 1D effective sound speed profile is selected at the midpoint between the source (DLR) and the receiver (IS26), geographically lying near the center of a triangle outlined by the cities of Nuremberg, Regensburg and Ingolstadt. These 1D effective sound speed profiles for each Ariane 5 engine test in the last 20 years are compared with each other to identify gross specific features that could explain the differences in observations (Fig. 5). The 1D midpoint profile is considered a good characterization of the atmospheric state, as the west-to-east conditions are quite stable over the distance range of 320 km between source and receiver. All ECMWF profiles for the detection cases (Fig. 5a) show a distinct maximum in effective sound speed between 40 and 60 km leading to a strong stratospheric duct. The similarity of these atmospheric profiles is reflected in the mean and median profiles, which are closely matching. Even when taking the standard deviation curves into account, the effective sound speed ratio—i.e., the ratio of the effective sound speed at stratospheric heights to the one near the ground—exceeds the value of one, which, according to Le Pichon et al. (2012), represents a sufficient condition for stratospheric ducting. Slightly lower values in the order of, e.g., 0.98 to 1 have proven to also occasionally permit stratospheric ducting (see Le Pichon et al. 2012; Koch and Pilger 2020) and will be considered as a separate class in this study. The ECMWF profiles for the non-detection cases (Fig. 5b) provide a contrasting picture, with about 75% of the cases not showing an effective sound-speed peak for stratospheric heights, therefore not enabling stratospheric ducting. This is, of course, reflected in the mean and median profiles, which do not reach effective sound speed ratios of one and therefore explain well the non-observations. However, it is also shown that about 25% of the non-detection cases indicate the presence of a stratospheric duct with the potential for stratospheric arrivals, which is reflected in the clear mismatch between mean and median and in the shape of the positive standard deviation curve. This latter curve also exhibits a sound speed ratio larger than one.

Fig. 5
figure 5

Effective sound speed profiles from the surface to the upper stratosphere for times of a signal detections and b non-detections using ECMWF profiles. From the individual profiles (grey lines) a mean (solid line) and median profile (dotted line) was determined, as well as the associated standard deviations (dashed lines). For case a the mean and median are nearly identical, while in the case of non-detections they are significantly different

In (Fig. 6) histograms of the resulting effective sound speed ratios (veff-ratio) from the ECMWF model applied for infrasound detections and non-detections at the times of Ariane 5 tests are displayed. Here, the shown quantity is defined as the ratio of the maximum effective sound speed in the altitude range of 30 to 70 km to the maximum sound speed in the lower 5 km above the ground. For the 71 detections (Fig. 6a), all but four show ratios above one, with the remaining ones falling short of this value by less than 1.5%. For the non-detections (Fig. 6b) about one quarter of these cases (20) shows effective sound speed ratios exceeding the enabling value of one. These cases therefore deserve further discussion below.

Fig. 6
figure 6

Histograms of the effective sound speed ratio of the ECMWF model for the cases with a and without b signal detections from Ariane 5 engine tests. For signal detections, the veff-ratio of the ECMWF model is > 1 except for four cases, where it is between 0.95 and 1. For nearly half of the non-detected tests it is between 0.85 and 0.9, but in 20 cases the veff-ratio even exceeds 1.0, which is indicative for the existence of a stratospheric duct

Additional examination of the effective sound speed ratios over the seasons (Fig. 7) in relation to detections and non-detections reveals an interesting—even though expected—pattern for winter and summer months (Koch 2010; Le Pichon et al. 2008). Near the times of the equinoxes the veff-ratio changes from values > 1 to values < 0.9. The most striking part is the rather small scatter in the veff-ratio during the summer, while the scatter can be in the order of 40–50% during the winter. In other words, the rather low variability of the effective sound speed ratios in the summer season explains the consistent lack of detections during this time. As effective sound speed is dominated by temperature and wind speeds, the strong scatter during winter is an expression of a higher variability of these parameters between the lower troposphere and the middle atmosphere (stratosphere) and thus also a more inconsistent pattern of detections and non-detections.

Fig. 7
figure 7

The effective sound speed ratios for the atmospheric profiles associated with the 71 signal detections and 88 non-detections. Cases with signal detections are marked with blue diamonds, while cases with non-detections are shown as red crosses. For the summer months June to August the veff-ratio is fairly stable between 0.85 and 0.9 and associated with the lack of any detection. For the remainder of the year, both detections and non-detections can occur

The probability and consistency of signal detection with veff-ratio is further investigated in (Table 2), listing the numbers of detections and non-detections depending on the veff-ratio. These ratios are subdivided into various ranges, namely > 1.2, 1.0–1.2, 0.98–1.0, and < 0.98. The last class does not contain any detection, as is also indicated in (Figs. 6, 7). (Table 2) shows three different quality levels for the detections in terms of their signal strength and variability: good, weak, and poor. Good signal conditions indicate SNR ≥ 2 and weak and poor conditions SNR < 2; poor signals furthermore show instable signal content, i.e. the signal is not clearly separable from noise over the complete engine test duration. In each column of the table we provide a percentage of ducting versus shadow zone cases. Inconsistencies are present when a shadow zone is modeled irrespective of a detection (table values below 100% in the detection cases) or when a stratospheric duct is modeled but no detection was made (table values above 0% in the non-detection cases). The corresponding propagation modeling for all rocket engine tests is performed using 2D ray tracing, launching rays between 4° and 86° inclination in 1° steps. A duct is present when the modeling is able to connect source and receiver with an eigenray (within a horizontal tolerance range of ± 20 km from the receiver, a value also considered in Koch and Pilger, 2019); only stratospheric ducts are considered here since the high signal frequencies above 2 to 3 Hz suppress efficient thermospheric ducting and the range of 320 km is estimated (and verified from observed arrival times as well as corresponding ray trace modeling) to be too large for efficient tropospheric ducting of the investigated source signals.

Table 2 Comparison of observations (detection statistics) and atmospheric conditions (effective sound speed ratios) to propagation modeling (stratospheric ducts or shadow zones) to identify inconsistent cases

Figure 8 summarizes the actual and modeled detectability by classifying the monthly cases into common receiver-operating-characteristic (ROC) quantities. In (Fig. 8a), the mid-point effective sound speed profile defines the modeling (yes if veff-ratio > 1, no otherwise), and in (Fig. 8b) it is the eigenray criterion. The seasonal variations of the true-positive (event detected and modeled in winter) and true-negative (event not detected and not modeled in summer) cases correspond with the dominant propagation conditions. The unstable ducting conditions around the equinoxes are reflected in increased false-positive (event not detected but modeled) and false-negative (event detected but not modeled) values. The number of false-positive cases (often referred to as false alarms) is increased in (Fig. 8a) during the winter months, compared to false-negative cases. Shadow zones (e.g., Hedlin and Walker, 2013) in the given source-to-receiver distance of 320 km between the first and second stratosphere-to-ground return may occur when the station is too near to the second ground reflection within a stratospheric duct while it is too far away for a single stratosphere-to-ground duct. This condition is present for large stratospheric refraction altitudes (of 50–70 km) and according effective sound speed conditions. Of the 29 cases with highest veff-ratio, i.e., > 1.2 (Table 2, first column), the overwhelming number of 22 cases consists of”good” detections, with a few cases of weaker and poorer signals and two false-alarm cases. For the next category of smaller veff-ratio, but still above one (Table 2, second column), the number of detections with good signals decreases rapidly. This is somewhat compensated by detections with lesser signal levels, while the non-detections reach proportions of about one third of the cases. In most of these non-detection cases the atmospheric model produces a shadow zone explaining well the absence of signals, which is also the general case for the other categories (Table 2, third and fourth column). Consequently, the number of false-positive cases is lower in (Fig. 8b), resulting in more true-negative events during the winter months.

Fig. 8
figure 8

Monthly statistics of true-positive (TP: detection yes, modeling yes), true-negative (TN: detection no, modeling no), false-positive (FP: detection no, modeling yes) and false-negative (FN: detection yes, modeling no) case numbers relative to the total number of tests per month (Table 1)

With the 21 detections for which the propagation modeling provides shadow zones—i.e., false negative in (Fig. 8b), the majority of which is true positive in (Fig. 8a) —and the 7 non-detections without an associated shadow zone (false positive in Fig. 8b; see also Table 2), we find 28 cases of Ariane 5 tests that are not adequately dealt with by the atmospheric modeling in terms of the signal explained by stratospherically ducted waves. A similar issue has been seen for the Ingolstadt explosion, where both Fuchs et al. (2019) and Koch and Pilger (2020) identified clear stratospheric arrivals in cases where the ECMWF model was not able to reproduce them. Both previous studies therefore suggest that failure to reproduce stratospheric arrivals at regional distances is not rare, but can occur in about 20 percent of the cases, like the 28 of 159 cases of our study mentioned above. Of course, some cases of the present study with IS26 being actually or possibly within a stratospheric shadow zone could be treated within the framework of dynamic gravity wave coupling (Gardner et al. 1993; Hedlin and Drob 2014). Such modification explains well the occurrence of a detection for the test on 14 May 2012, where propagation using the ECMWF model alone results in a shadow zone, as studied by Pilger et al. (2013).

For quantifying the rate of the ECMWF model potentially failing to reproduce a signal detection at IS26 from the Ariane 5 tests (i.e., false-negative cases), we can consider the 50 correctly estimated arrivals from propagation modeling and the 21 detections without successful modeling (see Table 2); hence, we get approximately a 1:2 chance of not being able to model the correct propagation result.

For the case of 0.98 < veff-ratio < 1, which is relatively close to the case of stratospheric ducting, we obtain a surprisingly similar result when a 4:7 chance of detection versus non-detection is found. And finally we have the case of 20 non-detections (i.e., for veff-ratio > 1, see also Table 3), of which 13 are explained by shadow zones (i.e., true negative in Fig. 8b), but 7 are not (false positive in Fig. 8a). Again, we have a 1:2 chance of not explaining an observation correctly.

Table 3 List of Ariane 5 engine tests for which a stratospheric duct exists based on the atmospheric model (ECMWF) (veff-ratio > 1), but for which no signals could be identified at IS26

5 Discussion

Of specific interest in the interpretation of infrasound from the Ariane 5 rocket engine tests are the cases where the veff-ratio enables a stratospheric duct in general, but when a receiver can still not detect a stratospheric arrival (false positive in Fig. 8a). In this scenario we presume that the stratosphere is equally transparent to atmospheric waves over a larger frequency range, which is mostly the case (Sutherland and Bass 2004; Waxler et al. 2017a). Often such scenarios arise when the range and location of associated shadow zones (Gutenberg 1939) varies largely and is particularly pronounced at regional distance, so that detections or non-detections may occasionally occur at a specific range like the 320 km distance studied here.

Of the 20 Ariane 5 tests where no signal could be identified in the IS26 waveform data despite veff-ratio > 1 (false-positive cases in Fig. 8a; see Table 3 for a list of these cases), 13 tests are associated with a shadow zone according to the ECMWF and propagation models, hence the fewer false-positive and more true-negative cases in (Fig. 8b). The remaining seven tests do not exhibit such a zone; thus signals are expected to occur, as is demonstrated in (Fig. 9). The propagation modeling shows, as indicated by the 1D effective sound speed ratio, a suitable stratospheric duct and rays bouncing between the ground and the middle atmosphere. In all seven cases, we see a stratospheric shadow zone within 120 to 150 km distance from the source, with an occasional second shadow zone at twice this distance. However, for the relevant distance of IS26 beyond 300 km, in all of the demonstrated cases an eigenray exists between source and receiver. The noise level at the station during the seven cases of interest is also not a suitable criterion to explain the non-detections. In six of the seven cases the noise level is well below the threshold of average noise that separates detectability from non-detectability (“Data Analysis”). In the seventh case (test case number 004 shown in Fig. 9b), it is just 15% above the limit and coincides with a high veff-ratio which should propagate the signal energy with little attenuation and thus sufficiently high SNR from the source to the receiver.

Fig. 9
figure 9

Two-dimensional ray trace propagation modeling between the DLR rocket engine test facility P5 (axes origin) and infrasound array IS26 (white triangle) for the seven non-detection cases. Upper stratosphere layers with effective sound speed values (veff, color-coded) larger than the effective sound speed near the ground (i.e., effective sound speed ratio > 1) provide suitable stratospheric ducting conditions without a shadow zone near the station. The closest eigenray is provided in green. Case numbers refer to the Test ID# listed in Table 3

The use of 2D ray tracing methods for the presented test cases is, in our opinion, sufficient to explain the source-to-receiver propagation at regional distances of 320 km, where 3D effects play a minor role (e.g., Lalande et al. 2012, state this for distances below 700 km). Cross-wind deviations from the source-receiver-azimuth are in the order of ± 5° (see Fig. 3) thus indicating little differences to be expected between 2D or 3D modeling. Application of full-wave modeling might support identifying signal characteristics and the effects of propagation on its frequency content, but is not considered in the present study.. Instead of this, a time duration of mostly 10 min of increased infrasound signal amplitude (see Fig. 2) can be identified (or not) that is characteristic of the rocket engine test signature that propagated from the test facility to the receiver.

Lastly, atmospheric gravity waves and small-scale disturbances are often poorly represented in global circulation models. For instance, Preusse et al. (2014) note that the short horizontal wavelength spectrum of the gravity wave momentum flux is underestimated in the ECMWF model, whereas it overestimates the longer wavelengths’ spectrum. Moreover, the sponge layer integrated in the ECMWF model at altitudes > 45 km further reduces the realistic representation of gravity waves in the upper stratosphere due to dampening to avoid unrealistic wave reflections at the model top (e.g., Ehard et al. 2018). However, while gravity wave perturbations have proven to be effective in explaining detections within shadow zones, because the fine structure introduced within the stratospheric model often enables eigenrays to reach recording stations that are otherwise missed (e.g., Pilger et al. 2013), we have not seen the opposite case (i.e., gravity wave perturbations reducing detectability). This can be explained by the tendency of this approach not necessarily to shift a shadow zone, but rather to narrow or close it, so that stations within shadow zones are reached by eigenrays (Hedlin and Drob, 2014). A similar argument can also be made for the case of parabolic equation modeling (Waxler et al. 2017b), with its ability to illuminate regions, for which ray tracing cannot provide adequate eigenrays. The number of cases where a shadow zone is modeled depends on the ray tracing parameters, as described in (“Atmospheric and Propagation Modeling”). Increasing the ray density and the tolerance for the horizontal distance of an eigenray to the receiver would lead to more cases of successful but not necessarily more correct source-to-receiver-modeling. Therefore the tolerance is fixed at a maximum of 20 km to take into account any station and model uncertainty, which also serves as a first approximation of gravity wave perturbations not covered in further detail in this study.

Investigations on the detectability of infrasound from repetitive sources at regional ranges of hundreds of kilometers have already been the focus of earlier studies (Le Pichon et al. 2005, 2006, 2010; Schwaiger et al. 2020), mostly investigating repetitive ocean swell and volcanic activity. Schwaiger et al. (2020) provide a study similar to ours investigating 70 different events of volcanic activity from Bogoslof volcano, Alaska, comparing infrasound array observations at regional distances with atmospheric and propagation modeling. When estimating the performance to correctly predict detections or non-detections, they see a rate similar to ours of two thirds of cases in agreement versus one third of cases where prediction and detection do not fit together. The influence of atmospheric dynamics on infrasound propagation and signal detectability was also a major topic of interest within the ARISE project (Assink et al. 2014; Blanc et al. 2018; Smets et al. 2016). Anyhow, these studies were mostly restricted to natural, recurrent sources with neither full ground-truth information nor a stable amplitude or signal duration. Few other studies (Blixt et al. 2019; Gainville et al. 2010; Gibbons et al. 2015; Pilger et al. 2021) focused on repetitive, anthropogenic infrasound events from identical locations like military test explosion series, ammunition destruction sites or rocket launch facilities, but still mostly not with a stable experimental setup and fixed source energy. Therefore a series of events with known and fixed location, duration and intensity like the one presented in this study over seasonal changes of two decades is a unique dataset to allow estimations of the general modeling performance in comparison with observations at the same, nearly unchanged station.

6 Conclusions

A repeatable and well-defined infrasound ground-truth source has been identified and studied using 20 years of infrasound observations at IMS station IS26 at a regional distance of about 320 km. It provides a consistent and controlled environment compared to other sources of infrasound such as rocket launches (Pilger et al. 2021), accidental explosions (Campus 2004; Campus and Christie 2010) or natural phenomena (Le Pichon et al. 2005, 2006). Within this regional distance stratospheric wave propagation develops including the occurrence of acoustic shadow zones. This phenomenon has therefore been observed over a larger range of atmospheric states and is clearly reflected by the two main seasons, summer and winter, being divided by the spring and fall equinoxes.

Based on the large number of 159 test events (completely listed in Table 4 with a summary of detection and modeling results), we have studied our ability of detecting infrasound signals from the ground-truth source and how it correlates with the development of a stratospheric duct. This duct is consistently absent for the path considered, from west-northwest to east-southeast, during the summer months, as reflected by a fairly stable effective sound speed ratio below 0.9. During equinox times this ratio changes regularly to values above 0.95 and mostly above 1.0 enabling stratospheric wave propagation. During the winter months the effective sound speed ratio is highly variable reaching values up to 1.5. Even though stratospheric ducting is thus given in principle, it does not necessarily lead to waves that reach an infrasound station, if the latter is located in a shadow zone. Therefore, of the nearly 90 cases with atmospheric conditions suitable to observe a stratospheric arrival, we find 20 cases without signal detection at IS26 (i.e., false alarms). While two thirds of these cases can be attributed to the occurrence of an acoustic shadow zone, we find one third of cases where the atmospheric model fails to produce a shadow zone and therefore fails to explain the lack of an arrival (false-positive cases). In these cases, however, we do not observe increased levels of background noise.

Table 4 Dataset for the 159 selected engine tests (see supplementary material) with column entries for the test number (“Test no.”), start time (“Test Date/Hour”, Year Month Day Hour, in UTC), quality of detection (“D”, 0: no detection, 1: poor, 2: weak, 3: good), effective sound speed ratio (“Veff ratio”), eigenray model (“ER”, 0: no, 1: yes)

On the other hand, for the 71 detections out of the 159 tests we note that at least 21 observations were made when the propagation model showed a shadow zone (false-negative cases). Together with the previous non-detection cases we can hypothesize that atmospheric models cannot completely explain infrasound observations at regional distances in 20 to 30% of the cases. Such a finding is supported by recent studies of Fuchs et al. (2019) and Koch and Pilger (2020) for an explosion source in roughly the same area of Central Europe, where strong evidence for stratospheric arrivals was found, but propagation modeling failed to support these findings.

This study provides an important insight in the expectation, validity and usefulness of atmospheric background and propagation modeling to explain detections and non-detections of infrasonic signals of interest at regional distances. This has important implications for determining station locations, e.g., in the context of new installations to monitor potential infrasound sources like volcanoes and explosion sites. It also suggests that further research is necessary to better understand and quantify the (in-)accuracies of atmospheric background and propagation characterization for ultimately improving the utilized modeling.