1 Introduction

Allan-Werle-plots have become a major tool in infrared spectroscopy applications for the characterization of instrumental stability and performance for field experiments [1,2,3,4,5,6,7]. Werle et al. [8] originally applied the two-sample-variance, or Allan Variance, to tunable diode laser absorption spectroscopy (TDLAS) to determine the maximum integration time and minimum achievable precision at this integration time.

However, a large maximum integration time is not necessarily needed in field experiments. For many species, the atmospheric variability is much higher than the instrument uncertainty [9]. In this case, averaging the data to a lower time resolution filters atmospheric features of the time series. In field experiments aboard a moving platform, e.g. aircraft-borne or drone-borne field experiments, a high temporal resolution is needed to enable localization of the measurements [10] and further data-driven analysis, e.g. calculation of fluxes via eddy covariance and wavelet transforms [11,12,13]. Furthermore, determination of the instrumental properties in the laboratory or in a stationary environment may not be transferable to in-flight performance [14].

According to Werle (2011) [15] many more features of the instrumental temporal dependencies can be extracted from an Allan-Werle-plot. Apart from uncorrelated, white noise, and random walk drifts, other noise contributions, e.g. flicker noise and sinusoidal interferences can be determined. This study will characterize instrumental temporal dependencies of the TRISTAR (tracer in-situ TDLAS for atmospheric research) instrument described in [2, 16,17,18] and investigate differences in the performance during ground and airborne operation. Using the information provided via Allan-Werle-plots, suitable repetition times of in-flight calibrations and their implication to measurement uncertainty estimation will be discussed.

2 The TRISTAR instrument

The TRISTAR instrument has been originally designed by Wienhold et al. (1998) [16] as a three-channel TDLAS that accumulates three laser beam paths via beam splitters [16] and in a later design via pneumatic moving mirrors [17]. The combined beam path is guided though a double corner cube White cell [16, 19] with a base length of \(0.5\,\)m and 128 passes. The outgoing beam is split into a signal and reference beam and both are focused to individual detectors. A reference cell filled with a high concentration of the target gas is inserted into the reference beam path to enable line determination and line locking via laser current offset adaption. The instrument has been successfully deployed in various aircraft campaigns [2, 20,21,22] aboard a Learjet 35A and the High Altitude Long-range (HALO) Gulfstream G550 research aircraft. During these campaigns a range of trace gases has been measured including CO, CH\(_4\), HCHO, NO\(_2\) and N\(_2\)O. Measurement uncertainties of \(0.5\,\%\), \(0.5\,\%\) and \(0.7\,\%\) have been achieved in earlier studies for CO, N\(_2\)O and CH\(_4\), respectively [2].

During 25 years of operation, various modifications have been applied to the original setup, including a major revision since the last reported experimental setup by Tadic et al. (2017) [18]. The cryogenic cooling system has been abolished by replacing individual components that were cooled via liquid nitrogen, i.e. lasers and detectors, by thermoelectrically cooled components. The three-laser setup has been replaced by single-laser operation which increases data coverage of the target species. While these measures might decrease the optimal performance and limit the diversity of data output, a simpler instrument maintenance and more autonomous operation can be achieved. The latter is enabled by substituting the V25 electronics and DOS PC by a National Instruments real time controller paired with an ARK Windows PC running a LabView real time application and graphical user interface. Establishing a PC-based user interface enables remote operation via internet and reduces the required training time for airborne operators in aircraft measurement campaigns. Autonomous and remote operation has been demonstrated during the ground-based field experiment at the German Taunus Observatory in 2021 [23], requiring only maintenance and calibration gas source exchange as on-site activities.

2.1 Optics and gas flow

A room temperature quantum cascade laser (RT-QCLAS) inside a high heat load (HHL) housing (AlpesLasers, Switzerland) is driven in cw-mode and emits a divergent beam. The beam is captured and focused onto a first focal point via a three-mirror telescopic optics (IPM Freiburg). Via a pair of an off-axis parabolic (OAP) and elliptic (OAE) mirror the beam is converted into a parallel beam with a width of approximately \(14\,\)mm. The large beam waist makes the setup less sensitive to aerosol particles and imperfections on the mirrors and turbulence in the optical path.

Several planar mirrors guide the path to the entrance of the White cell. There the parallel beam is focused to a second focal point using another pair of an OAE and OAP mirror. Then the divergent beam is injected through wedged CaF\(_2\) windows into the double corner cube White cell, where it gets reflected 128 times.

The White cell has a length of \(0.5\,\)m and holds a volume of \(3\,\)l. The cell is drained by a scroll vacuum pump set to a constant gas flow of \(3\,\)slm (standard liters per minute). On the top of the cell a movable valve operated by a step motor regulates the cell pressure to \(50\,\)hPa. The inlet is connected to a T-junction, where calibration gas can be injected by a mass flow controller (Bronkhorst). Excess calibration gas exits through the ambient air inlet. A detailed schematic of the tubing is given in Tadic et al. (2017) [18], Figure 2.

The outgoing beam is focused on a third focal point and converted back into a parallel beam using another pair of OAE and OAP mirrors. The parallel beam is split via a CaF\(_2\) beam splitter (50:50) into signal and reference beam. The reference beam is further passed through a reference cell filled with high concentration of the target gas. Both parallel beams are focused via coated CaF\(_2\) lenses with focal length of \(40\,\)mm onto room-temperature photovoltaic infrared detectors (VIGO systems, Poland). All optical components are mounted on a rigid aluminum plate that is electrically heated to ensure thermal stability. This makes the optical setup less sensitive to distance variations caused by thermal fluctuations.

2.2 Electronics and data processing

The instrument runs autonomously inside a NI LabView real time application on the real time controller (NI cRIO 9038). The software controls pump, inlet and MFC and writes data to disk. Through the FPGA module the scanning ramp and modulation is set at a clock frequency of \(250\,\)kHz with a modulation frequency of \(31.25\,\)kHz and a scanning frequency of 60 ramps per second. The ramp and modulation are input to the laser controller that operates the QCL at a set temperature and laser base current to obtain a laser frequency near the absorption central frequency.

The detector output is digitally demodulated at twice the modulation frequency for 2f wavelength modulation spectroscopy (WMS). The raw signal is down-sampled by a CIC-stage to reduce the amount of data. Individual spectra of the increasing (up) and the decreasing (down) part of the triangular scanning frequency are averaged separately to obtain a time resolution of \(1\,\)Hz. The averaged up and down spectra are saved to disk and further processed for online operation.

The described configuration of modulation frequency, scanning frequency and time resolution can be adjusted to the needs of the experimental requirements.

Throughout the experiments carried out in this study, the QCL is operated near \(2190\,\)cm\(^{-1}\), where a strong line pair of CO and N\(_2\)O is located [24]. The corresponding line strength is \(2.879\cdot 10^{-19}\,\)cm\(^{-1}/(\)molec\(\cdot\)cm\(^{-2})\) [24, 25]. The scanning range is optimized to enable simultaneous measurements of both species.

The saved data is post-processed using custom code written in IGOR Pro (Wavemetrics). The raw signal is filtered using a IIR low-pass filter, the line centers and widths are determined from the reference signal and the spectra are linearly fit against the most stable absorption spectrum obtained during a calibration measurement.

2.3 Instrumental performance measures

The total measurement uncertainty is composed of several metrics that each contribute differently to the limitation of the measurement device. The systematic error is defined by the accuracy and describes the overall deviation of all data points from the true value. Since the TRISTAR device uses the relative measurement mode of WMS, the uncertainty of the conversion factor from arbitrary units to mixing ratio contributes to the accuracy. The accuracy depends on the calibration gas mixing ratio uncertainty as determined in the laboratory or provided by the gas standard supplier.

The systematic deviations caused from drifts can be mitigated by regular calibrations. Considering changes in relative sensitivity, the drifts can be corrected via linear interpolation of the reciprocal mixing ratios. This interpolation scheme requires that the drift speed is not faster than the interval between calibrations. The reproducibility \(\textrm{R}\) refers to the relative spread of the calibrations over a complete measurement. This uncertainty cannot be lowered by further averaging.

The precision \(\textrm{P}\) refers to the relative statistical error at a given integration time after drift correction. It is obtained from the relative standard deviation of the calibration measurements. The total relative error at native time resolution is a combination of \(\textrm{P}\) and \(\textrm{R}\):

$$\begin{aligned} \frac{\sigma _\chi }{\chi } = \sqrt{\textrm{P}^2 + \textrm{R}^2} \end{aligned}$$
(1)

The detection limit refers to the lowest mixing ratio distinguishable from noise. It is estimated from the standard deviation of absorption-free gas injected into the absorption cell. By interpreting the precision as pure absolute error, the precision at the mixing ratio of the calibration gas gives an upper boundary and a worst case estimate to the detection limit.

Werle et al. [8] define the Allan variance as the mean squared difference between adjacent data batches for different integration times \(\tau\):

$$\begin{aligned} \sigma _\chi (\tau )^2 =&\ \frac{1}{2} \sum _{k=0}^{M-2} \left( \chi _{k+1}(\tau ) - \chi _k(\tau )\right) ^2 \end{aligned}$$
(2)
$$\begin{aligned} \chi _{k}(\tau ) =&\ \frac{1}{\tau }\int _{t=k\tau }^{(k+1)\tau } \chi (t)dt \end{aligned}$$
(3)

As reported in [15], plotting the Allan variance as a function of the integration time (Allan-Werle-plot) provides insight into the temporal characteristics of an absorption spectroscopy measurement and serves as an alternative to other frequency domain analysis tools like Fourier transforms. The maximum integration time \(\tau _{\max }\) refers to the minimum in the Allan-Werle-plot where the trade-off between the decrease from the integration of white noise and the build-up from integration of drifts is optimal. Considering only uncorrelated white noise \(\sigma _W\) and random walk noise \(\sigma _B\) the Allan-variance \(\sigma ^2\) and \(\tau _{\max }\) can be modeled as

$$\begin{aligned} \sigma ^2(\tau ) =&\ \frac{\sigma _W^2}{\tau } + \frac{\sigma _B^2}{2}\tau \ \textrm{and} \end{aligned}$$
(4)
$$\begin{aligned} \tau _{\max } =&\ \sqrt{2}\frac{\sigma _W}{\sigma _B}, \end{aligned}$$
(5)

respectively.

3 Ground operation inside the HALO aircraft

The TRISTAR instrument is part of the TRIHOP rack that also carries the hydrogen peroxide and higher organic peroxide monitor HYPHOP [26]. The rack is mounted into the research aircraft HALO. The inlet is connected to a T-junction that may provide scrubbed cabin air during zero cycles. Scrubbing is shut off during CO measurements. The calibration gas tube is connected to a \(6\,\)l composite bottle (AUER). During ground operation the measurement devices run on external power while the aircraft auxiliary power unit and main engine remain shut off. As a consequence the integrated air conditioning of the aircraft also remains off and air conditioning has to be provided externally by ventilation. This lead to a slowly increasing temperature inside the aircraft over a whole day of ground operation up to 35 \(^\circ\)C cabin temperature.

3.1 Performance of calibration gas measurement

Several functional tests were carried out during ground operation. Determining the precision and long-term stability of the instrument is important to estimate the expected performance during the research flights. In order to collect suitable data the cell was flushed with calibration gas regularly and also for longer time periods. Extended measurements of calibration gas provide a good estimate of the statistical measurement uncertainty directly from the standard deviation. A time series and corresponding Allan-Werle-plot of such an extended period of calibration gas measurement is shown in Fig. 1. Calibration gas with a CO mixing ratio of \(155.6\,\)ppbv was injected over a period of 2 h. The absolute and relative standard deviation of the measurement are \(1.7\,\)ppbv and \(1.1\,\%\), the latter value relative to the mixing ratio of the calibration gas. The Allan-variance \(\sigma ^2\)(\(\tau\)) can successfully be modeled by pure noise \(\sigma _W\) and drift \(\sigma _B\) contributions. The optimal fit parameters are given in the figure caption of Fig. 1 and the maximum integration time results to \(20\,\)s according to (5).

Fig. 1
figure 1

Calibration gas measurement during ground operation inside the HALO aircraft. Top: time series of the 2 h segment of calibration gas. Bottom: Corresponding Allan-Werle-plot with fit to (4). The optimal fit parameters are \(\sigma _W\) = \((0.4293 \pm 0.0022)\,\)ppbv and \(\sigma _B\) = \((0.0309 \pm 0.0004)\,\)ppbv

3.2 Performance of ambient gas measurement

Determining the measurement uncertainty as indicated in the previous section is not possible during ambient measurements, however, since the standard deviation is a mixture of atmospheric variability and instrument uncertainty. Instrument drifts are also hard to distinguish from changes in the mixing ratios of the sampled air masses. Determining an appropriate interval of calibrations to minimize the effect of drifts and estimate the measurement uncertainty is a non-trivial task during field experiments and airborne field experiments, especially.

Using the Allan-Werle-plot in Fig. 1, similar data can be simulated that shares the temporal characteristics obtained from the measurement. This can be achieved with the following simulation setup, where \(\chi _0\) refers to the mixing ratio of the calibration gas.

$$\begin{aligned} h_t =&\ h_{t-1} + \epsilon _B \end{aligned}$$
(6)
$$\begin{aligned} \chi _t =&\ \chi _0 + h_t + \epsilon _W \nonumber \\ \textrm{with}&\ \epsilon _B \sim \mathcal {N}(0,\sigma _B^2), \nonumber \\&\ \epsilon _W \sim \mathcal {N}(0,\sigma _W^2) \end{aligned}$$
(7)

A resulting time series can be found in Fig. 2. Starting from this simulated data, points can be artificially flagged as calibration and ambient to simulate the effect of different intervals between calibrations during a real field measurement scenario. The effects on the correction of the drifts and the estimation of the true measurement uncertainty can be studied. For each individual simulation, \(\textrm{R}\) is calculated from the spread of the means of the flagged calibrations. The reciprocal means are interpolated to estimate the sensitivity drift. The interpolated sensitivity is multiplied to the amplitudes to obtain the drift-corrected amplitude. \(\textrm{P}\) is obtained from the relative standard deviation of the drift-corrected calibration amplitudes. The relative standard deviation of the corrected amplitudes which are flagged ambient, on the other hand, gives a good estimate of the residual measurement uncertainty (RMU) due to drifts and noise present in the data. An optimal interval between calibrations will decrease the RMU until P is reached. The estimated measurement uncertainty (EMU) as defined in (1) should optimally be close to the true relative standard deviation of the simulated dataset.

Fig. 2
figure 2

Example simulation process carried out to retrieve the metrics P, R, RMU and EMU. Top left: Simulated time series with flagged ambient (red) and calibration (blue) segments. Bottom left: individual means of flagged calibrations. The standard deviation of these values is used to calculate R. Top right: Interpolated sensitivity that is connected to the reciprocal of the calibration means. Bottom right: Drift-corrected time series. The standard deviation of the data flagged calibration yields P, the standard deviation of the data flagged ambient yields RMU

Figure 3 shows the described metrics for different intervals between calibrations. The observable features match the expectations: If the calibration frequency is too low, the drifts cannot be successfully corrected. Thus, the RMU is close to the original measurement uncertainty. Furthermore, the EMU is very unstable since there are too few calibration measurements in total. With more calibrations both of these effects are reduced. The EMU approximates the original measurement uncertainty better and the RMU decreases. In principle, more calibrations are always advantageous. However, more calibrations also result in less ambient measurements and a high calibration gas consumption. An empirical choice is made that subdivides the calibration frequencies into three regimes: Too many calibrations only contribute diminishing returns while decreasing the duty cycle, too few calibrations cause the effects described at the beginning of this paragraph, and the appropriate choice lies in the middle. The appropriate region is chosen within \(10\tau _{\max }\) and \(100\tau _{\max }\), where \(\tau _{\max }\) refers to the maximum integration time derived from the Allan-Werle-plot.

Fig. 3
figure 3

Performance metrics for different intervals between calibrations. The simulation was repeated 100 times. The solid lines indicate the mean values and the shaded areas indicate the standard deviations of each metric for the 100 repetitions. The metrics shown are R (green), P (blue), RMU (red) and EMU (yellow). The black horizontal line shows the total measurement uncertainty of the simulated time series. The black vertical lines separate the three regimes a): too many calibrations, b): appropriate interval between calibrations and c): too few calibrations. The spikes in the results originate from discrete changes in the number of calibrations that fall inside the total number of data points in the time series for each given calibration interval

Accordingly a calibration interval of \(30\,\)min lies within the desired regime. On another day of ground operation, regular calibrations against a gas bottle with a CO mixing ratio of \(312\,\)ppbv were carried out at that interval. The resulting values for R and P are \(1.0\,\%\) (3.08 ppbv) and \(0.2\,\%\) (0.65 ppbv), respectively, and accurately estimate the relative total measurement uncertainty. Furthermore, a successful correction for the random walk drifts can be assumed considering the results of the simulation shown in Fig. 3. The residual measurement uncertainty of the ambient data can therefore be estimated with the precision P, reducing the uncertainty from a value of \(1.1\,\%\) (3.15 ppbv) to \(0.2\,\%\) (0.65 ppbv) as a consequence of regular calibrations.

4 Airborne operation

In the previous section, the instrument has been characterized during ground operation aboard the HALO aircraft. An appropriate interval for calibrations has been chosen that effectively corrects for random walk drifts and allows for a reduction of the measurement uncertainty. Now the question arises: Is the ground-based performance transferable to airborne operation?

In order to study the instrument response to disturbances caused by airborne operation, e.g., vibration, acceleration, orientation and pressure, a test flight was performed where TRISTAR was constantly measuring calibration gas with a mixing ratio of \(312\,\)ppbv. The data acquired can be used to qualitatively and quantitatively analyze the effects of this new environment to the instrumental performance.

Figure 4 shows the time series and the corresponding Allan-Werle-plot of this test flight. The time series indicates different environmental conditions throughout the flight. During takeoff and landing periods, rapid accelerations impact the optical setup and lead to distortions in the signal. At high altitudes, the pressure regulation of the inlet starts to oscillate and results in oscillations of the gas concentration. In post processing these oscillations can be corrected by assuming a linear dependence on the cell pressure. On intermediate flight levels the system is more stable. Therefore, calculating the Allan variance for the complete data set is no longer meaningful, as the experimental conditions and the underlying distribution is changing. Thus, the Allan variance is calculated individually for the different conditions derived from the time series and metadata.

Fig. 4
figure 4

Calibration gas measurement during test flight #01 on the 22nd of November, 2022. The data is segmented according to changes in the environmental conditions into the takeoff and landing phase (yellow), first plateau segment (green), high altitude segment (blue), second plateau segment (grey), complete data set (red). Top: time series of the 5 h flight, separated into these segments, plotted against the left axis and GPS altitude (black) plotted against the right axis. Bottom: Corresponding Allan-Werle-plots with fits to (8). The optimal fit parameters are summarized in Table 1

The Allan variance of the total data set is a combination of the individual variances. This obstructs the information extractable in the individual Allan-Werle-plots. The takeoff and landing phase is dominated by a strong sine wave and drift that are caused by the mechanical system reacting and recovering from the strong changes in acceleration. The high altitude section is dominated by a strong sine wave that matches the oscillations caused by the pressure regulation. The flight segments on intermediate flight levels are quite stable. During these segments a series of left and right turns and various altitude changes were carried out to study the instrument performance.

Again, the Allan variances are fit against theoretical models to quantify the temporal features. This time the composition is more complex, so pure white noise and random walk is not sufficient to reproduce the measured features. Flicker noise \(\sigma _F\) is added that acts as a noise floor according to [15]. Sinusoidal variations contribute a term proportional to \(A_{\sin }\) and dependent on the period \(T_{\sin }\). Additionally, constrained random walk noise that behaves like white noise for longer time spans is introduced with amplitude \(A_\textrm{MH}\) and transition time \(T_\textrm{MH}\). The theoretical model results in (8), and the most appropriate fit is chosen with the least number of parameters. The resulting fit parameters are summarized in Table 1.

$$\begin{aligned} \sigma ^2(\tau )&= \frac{\sigma _W^2}{\tau } + \frac{\sigma _B^2}{2}\tau + \sigma _F^2 + A_{\sin }^2 \frac{\sin \left( \pi \frac{\tau }{T_{\sin }} \right) ^4}{\left( \pi \frac{\tau }{T_{\sin }} \right) ^2}\nonumber \\&+ \frac{A_\textrm{MH}^2}{\sqrt{2}} \left( \sqrt{\frac{T_\textrm{MH}-1}{\tau }} + \sqrt{\frac{\tau }{T_\textrm{MH}}}\right) ^{-2} \end{aligned}$$
(8)
Table 1 Optimal fit parameters of each Allan-Werle-plot shown in Fig. 4 when fitting against (8). Parameters marked – are neglected since they do not contribute major features of the Allan variance. For comparison, the fit results from ground operation is also appended to the table

During the stable segments the values extracted from the Allan-Werle-plots for the white noise and random walk contributions are similar to the values obtained from the Allan-Werle-plot of the ground operation. This indicates similar effects influencing the instrumental performance on both modes of operation. Features exclusive to airborne operation are manifested through additional flicker and sinusoidal contributions during the stable segments. More noticeable features are visible in the time series and enable segmentation of the time series.

Similar to the analysis carried out in the previous section, data was simulated that follows the temporal characteristics of the two stable segments of the flight. The results of the simulation procedure is shown in Fig. 5. Since the Allan Variance in these cases contains flicker and sinusoidal components, the results differ from the simulation results of the ground operation. The flicker and sinusoidal components contribute further short-term uncertainties that increase the RMU on shorter periods even with high frequency of calibrations. Thus, the measurement uncertainty on ambient data can no longer be estimated based on the precision only but needs to be approximated by the EMU (1). Accurate estimation of the EMU is again possible within the appropriate regime of calibration intervals. This regime can again be calculated from (5) and lies inbetween \(10\tau _{\max }\) and \(100\tau _{\max }\). For the first plateau segment, \(\tau _{\max }\) is equal to \(26\,\)s. For the second plateau segment, no value for \(\sigma _B\) could be extracted from the Allan-Werle plot. A time interval of \(30\,\)min again lies inside the desired range.

Fig. 5
figure 5

Performance metrics for different intervals between calibrations. The simulation was repeated 100 times. Top: First plateau segment of Testflight #1. Bottom: Second plateau segment of Testflight #1. The solid lines indicate the mean values and the shaded areas indicate the standard deviations of each metric for the 100 repetitions. The metrics shown are R (green), P (blue), RMU (red) and EMU (yellow). The black horizontal line shows the total measurement uncertainty of the simulated time series. The black vertical lines (only applies to top plot) separate the three regimes a): too many calibrations, b): appropriate interval between calibrations and c): too few calibrations. The spikes in the results originate from discrete changes in the number of calibrations that fall inside the total number of data points in the time series for each given calibration interval

During a second test flight, the operation of the TRISTAR instrument was switched to an operation scheme used in real research flights. This way the instrumental behavior during gas switching and pressure regulation at different altitudes were tested. Ambient gas measurements were carried out and regular calibration measurements with a mixing ratio of \(158.8\,\)ppbv were performed for drift correction and measurement uncertainty determination.

The time series of Testflight #2 is shown in Fig. 6. The precision P and reproducibility R as evaluated from the repeated calibrations are equal to \(0.35\,\%\) (0.56 ppbv) and \(1.50\,\%\) (2.38 ppbv), respectively. The estimated measurement uncertainty matches the absolute standard deviation of the subset of Testflight #1 where takeoff and landing phase are discarded. These values are equal to 2.45 ppbv and 2.37 ppbv, respectively.

Fig. 6
figure 6

Ambient and calibration gas measurement during Testflight #02 on the 25th of November, 2022. The plot shows the ambient mixing ratio (red), calibration gas mixing ratio (blue) plotted against the left axis and GPS altitude (black) plotted against the right axis

5 Summary and conclusion

Allan-Werle-plots have become a key tool in infrared absorption spectroscopy to investigate the temporal stability of a measurement instrument. However, these plots are commonly used to determine the maximum integration time and the best achievable precision. In this study, Allan-Werle-plots were used to analyze the temporal characteristics of the newest iteration of the TRISTAR instrument both for ground operation inside the research aircraft and for airborne operation. Determination of the instrumental stability through the maximum integration time is not beneficial due to the desired time resolution during research flights. Obviously, such unstable measurement environment like a moving aircraft can lead to strong drifts.

Instead, the Allan-Werle-plots were used to identify the dominant noise types present during the measurement. Via least-squares fitting the contributions of white noise, random walk noise, flicker noise, meta-stable random walks and sinusoidal contributions were obtained for the different measurement conditions.

Simulation of data that recreates the characteristics observed in the Allan-Werle-plots allowed for an extensive analysis of the choice of the interval between calibrations. Several metrics were defined to identify the capability of the procedure to correct for drifts and estimate the true measurement uncertainty. An interval of \(30\,\)min was found sufficient for complete drift correction during ground operation. Thus, estimation of the measurement uncertainty with only the precision was enabled. The expectations were verified using a second data set with the appropriate calibration frequency.

The same analysis was carried out using data from two test flights. Segmentation of the time series according to the sudden changes in the measurement conditions enabled resolution of the dominant noise types for each individual segment during the flight. Dominant sinusoidal contributions were observed during a high altitude segment due to oscillations in the absorption cell pressure regulation. Low-frequency contributions dominated the takeoff and landing phases. Investigation of stable segments during the flight showed similar characteristics to ground operation. This designated sufficient intervals between calibrations of \(30\,\)min during research flights for measurement uncertainty estimation and drift correction. However, apparent flicker and sinusoidal contributions lead to a lowered accuracy for reasonable calibration frequencies. Therefore, the measurement uncertainty had to be estimated considering the reproducibility of calibrations. A second test flight verified the estimated measurement uncertainty for the chosen calibration frequency.

The results of this study show yet another great application of Allan-Werle-plots to increase the understanding and the accuracy of infrared absorption spectrometers. The implications for drift correction and measurement uncertainty are also transferable to other measurement instruments for trace gas measurements.