1 Introduction

Although process modeling has greatly improved in the recent past, the capability to properly simulate the water and energy cycles is still conditioned by knowledge gaps. This translates into a missing ability to precisely forecast extreme precipitation events and results in an inaccurate assessment of the impact of natural variability and human activity on the water and energy cycles both on the regional and the global scales (IPCC 1996). The closing of this knowledge gap requires an appropriate comprehension of the key processes influenced by the three-dimensional distribution of atmospheric water vapor and temperature. Consequently, accurate and high-resolution measurements of the vertical profiles of water vapor mixing ratio and temperature are highly required, with space and time resolution high enough to allow resolving the atmospheric boundary layer structure, including the surface, the mixed, and the interfacial layers of the atmospheric boundary layer (Wulfmeyer et al. 2015).

This is particularly true in the Mediterranean coastline area, which is largely affected by heavy precipitation and flash flooding events as a result of specific forcing mechanisms (among others, orography, radiation, low-level convergence) and the abundant evaporation from the sea surface. These events, ultimately causing casualties and damages each year for hundreds of millions of euros (e.g., Ricard et al. 2012), are generally associated with increasing humidity levels within the boundary layer and the lower part of the free troposphere (Keil et al. 2008). In this regard, the accurate knowledge of the three-dimensional temperature field, in additional to the humidity field, is of paramount importance to properly assess the degree of instability in the atmosphere and the potential for convective activity. Hence, the availability of densely distributed, accurate, and high-resolution measurements of the vertical profiles of water vapor mixing ratio and temperature is even more important in these areas. Furthermore, a better modeling of heavy precipitation and flash flooding events can be achieved through an effective data assimilation of water vapor mixing ratio and temperature profile measurements (Wulfmeyer et al. 2006; Adam et al. 2016).

Unfortunately, none of the traditional observational techniques for measuring water vapor and temperature profiles possesses the performance to adequately sample the troposphere with the required high space and time resolution. Some observational networks, such as the network of upper air radiosounding stations, are characterized by high accuracy and vertical resolution, but lack the required temporal resolution and density distribution of sensors, as in fact the global radiosonde network is quite sparse, including ~ 800 stations, and has limited coverage in oceanic areas. Other networks, such as the global network of ground-based microwave radiometers (Cadeddu et al. 2013) or the network of ground-based global navigation satellite system (GNSS) receivers (Bevis et al. 1992), are characterized by a sufficiently high density of sensors, high accuracy, and temporal resolution, but lack the needed vertical resolution.

Lidar systems based on the application of the differential absorption lidar (DIAL) technique or the Raman technique can potentially overcome the above mentioned observation systems’ limitations, providing water vapor and temperature profiles with high vertical and temporal resolution, high precision, and low bias. At present, no consolidated network of water vapor and temperature lidar systems has been established and their distribution is very sparse. However, an effective demonstration of the high performance and large potential of lidar systems may certainly lead to a gain in confidence in the data user community regarding the future exploitation of these sensors and their data.

In the direction of demonstrating the performance potentials of these systems and assessing their capabilities in terms of measurement precision and accuracy, a comprehensive inter-comparison effort has been carried out and results are reported in this paper. Water vapor mixing ratio profiles from a DIAL and two Raman lidar systems, as well as temperature profiles from one Raman lidar system, are considered and compared with measurements from other sensors operating during the first Special Observing Period of the Hydrological cycle in the Mediterranean Experiment (HyMeX-SOP1). Considering simultaneous and co-located profiles from the different sensors, the present inter-comparison effort provides accurate estimates of the uncertainty affecting the lidar systems and the other involved sensors. Performance of the different systems is expressed in terms of mutual biases and root-mean-square (RMS) deviations between the different sensor pairs as a function of altitude.

The need for an accurate assessment of different lidar systems’ performance has already been recognized in previous studies (Whiteman et al. 2006; Behrendt et al. 2007a, 2007b; Bhawar et al. 2011; Chazette et al. 2016). Such studies are an essential step towards ensuring that the various water vapor and temperature datasets acquired during field experiments are consistent and self-coherent across a region of interest and can be used unambiguously for process analyses, model validation, and comparison with space-borne retrievals.

More specifically, the papers by Behrendt et al. (2007a, 2007b) reported results from a large inter-comparison effort involving different water vapor lidar systems, which was carried out in the framework of the IHOP 2002 project (Weckwerth et al. 2004) over the United States of America Great Plains region. This inter-comparison study involved four lidar systems (three airborne, one ground-based) and meteorological and chilled mirror radiosondes. An even larger number of lidar systems (namely six: three DIALs, three Raman lidars; four of these were ground-based systems and two of these were airborne systems) was involved in the inter-comparison study reported by Bhawar et al. (2011) in the framework of the Convective and Orographically-induced Precipitation Study (COPS, Wulfmeyer et al., 2011) in the region of the Vosges and the Black Forest (France and Germany). In the present study, we consider only three lidar systems (two ground-based Raman lidars, BASIL and WALI, and one airborne differential absorption lidar, LEANDRE 2), but we extend the inter-comparison to a larger ensemble of additional sensors (a microwave radiometer, aircraft in situ thermodynamic sensors, and meteorological radiosondes). The present inter-comparison effort is also considering atmospheric temperature, in addition to water vapor mixing ratio. This paper is complementary to the one by Chazette et al. (2016), which focuses on the inter-comparison of analogous water vapor datasets over the island of Menorca, Spain (WALI, radiosoundings, aircraft in situ measurements during takeoff, and LEANDRE 2 profiles). In the present paper, we also assess the coherence of the HyMeX water vapor datasets across the North-Western Mediterranean by comparing the observations by BASIL and WALI, which have been carried out during the final phase of SOP1, thus allowing a bridging of the thorough inter-comparison efforts conducted in Southern France and the Balearic Islands.

The outline of the paper is the following. The “HyMeX-SOP1 measurement strategy in the North-Western Mediterranean basin” section shortly describes the HyMeX project and SOP1 measurement strategy. The “Profiling sensors involved in the inter-comparison” section provides a short description of the lidar systems and the sensors involved in this inter-comparison effort, including an assessment of the lidar systems uncertainty in terms of both random and systematic errors. The “Statistical quantities considered in the inter-comparison: definition of bias and root-mean-square deviation” section describes the algorithms used to compare different sensors and estimate the bias and root-mean-square deviation as a function of altitude for the different instrument pairs. The “Results” section illustrates and discusses the inter-comparison results for the different sensors’ pairs. Results are summarized in the “Summary and future perspectives” section.

2 HyMeX-SOP1 measurement strategy in the North-Western Mediterranean basin

Because of its location in between the tropical and the mid-latitude climate regions, the Mediterranean basin is characterized by a delicate and unstable climate balance determined by influences from these two climatic regions. As this balance is very sensitive to disturbances, small system perturbations may determine drastic changes in surface temperature and the hydrological cycle in the Mediterranean area. Most climate models forecast a significant decrease on precipitations over the next century and a variability increase during the hot season (among others, Giorgi and Lionello 2007), combined with a surface temperature increase of 1 to 6 °C by 2100. However, large uncertainties characterize the future evolution of these climate scenarios and, consequently, substantial improvements are needed in monitoring and modeling the Mediterranean climate in order to reduce these uncertainties. This is one of the strong motivations behind the Hydrological Cycle in the Mediterranean Experiment (HyMeX, Ducrocq et al. 2014). More specifically, HyMeX aims at improving the comprehension of the water cycle, with particular emphasis on its role on extreme events, through the monitoring and modeling of the land-ocean-atmosphere system. HyMeX included a Special Observation Period (SOP1), with the duration of few months (September–November 2012), having a specific focus on the study of heavy precipitation and flash flood events. HyMeX-SOP1 aimed at providing detailed observations of specific water cycle key processes in three selected “target areas” (North-Western Mediterranean, South-Eastern Mediterranean, and Adriatic). Within the Western Mediterranean target area, five sites were selected for the deployment of research instruments, namely the Cévennes-Vivarais and the Corsica sites in Southern France, the Central Italy and Northeastern Italy sites, and the Spanish Balearic Islands site in Menorca (Ducrocq et al. 2014; Bousquet et al. 2015; Legain et al. 2013). Operational meteorological and hydrological ground networks were also available. Throughout the duration of HyMeX-SOP1, the University of BASILicata ground-based Raman Lidar system (BASIL) was deployed in the Cévennes-Vivarais site (Candillargues, Southern France, Lat: 43° 37′ N; Long: 4° 4′ E; Elev: 1 m) and operated between 5 September and 5 November 2012, collecting more than 600 h of measurements, distributed over 51 measurement days and 19 intensive observation periods (IOPs). The major feature of BASIL is represented by its capability to provide high-resolution and accurate measurements of atmospheric temperature and water vapor, both in daytime and nighttime, based on the application of the rotational and vibrational Raman lidar techniques in the UV. BASIL is also capable to measure water vapor and temperature profiles throughout clouds, as long as these are not optically thick, as opposed to DIAL systems, where the Doppler broadening effect associated with these clouds cannot be completely corrected for. This observational capability makes BASIL a well-suited tool for the characterization of the water vapor inflow in Southern France, which is an important piece of information to improve the comprehension and forecasting capabilities of heavy precipitation events, especially in the North-Western Mediterranean basin.

In addition to BASIL, other state-of-the-art sensors with water vapor mixing ratio and temperature profiling capability were present in the supersite of Candillargues. These included a microwave radiometer, GNSS receiving stations, and a radiosonde launching facility. Furthermore, the ground-based water vapor and aerosol Raman lidar WALI (Chazette et al. 2014) was temporarily deployed in Candillargues, in the proximity of BASIL, for a few hours on the evening of 30 October 2012, and comparisons from these two Raman lidars were also possible.

A careful verification of the performance and data quality of the involved sensors is of primary importance for a successful measurement campaign, especially when animated by scientific motivations as those identified for HyMeX. Additionally, a data quality assessment based on a thorough inter-comparison involving all different humidity/temperature sensors is also necessary to ensure a consistent dataset and provide accurate error estimates for all systems. The need for a dedicated water vapor and temperature inter-comparison effort to be carried out during HyMeX-SOP1 was also recognized by the European Commission, which founded the Project “Inter-comparison of airborne and ground-based Lidar measurements for the characterization of atmospheric water vapour and temperature profiles” (WaLiTemp) in the framework of the Integrating Activity “European Facility For Airborne Research” (EUFAR) of the 7th Framework Program. This project allocated eight dedicated flight hours of the French research aircraft ATR42, operated by the Service des Avions Instrumentés pour la Recherche en Environnement (SAFIRE). The ATR42 was equipped with the airborne DIAL LEANDRE 2, capable of profiling water vapor mixing ratio above or beneath the aircraft, and with a variety of in situ sensors for turbulence measurements, as well as with aerosol/cloud microphysics probes. Specific temperature/pressure/humidity probes were also present (more details on these probes are provided in the “Thermodynamic sensors onboard the ATR42” section). The present inter-comparison effort considers atmospheric water vapor measurements from airborne and ground-based lidar systems and additional airborne and ground-based remote sensing and in situ sensors (a microwave radiometer, aircraft in situ sensors, and meteorological radiosondes), and atmospheric temperature measurements from a ground-based lidar, a microwave radiometer, aircraft in situ sensors, and radiosondes.

The ATR42 was stationed in Montpellier airport, having the possibility of refueling in the Balearic Islands or Corsica. A specific flight pattern was conceived for the purpose of this inter-comparison effort (Fig. 1). This pattern consisted of an elongated circle path, with the aircraft spiraling up and down around a central location, which was initially aimed to be the Cévennes-Vivarais site in Candillargues. Unfortunately, because of air traffic restrictions and the lidar operation startup time, the closest LEANDRE 2 profiles to Candillargues are typically about 30 km eastward.

Fig. 1
figure 1

ATR42 flight pattern considered in the EUFAR-WaLiTemp flights (red line). The black dot represents the position of Montpellier Airport, where the ATR42 took off and landed, while the red dot represents the position of the Raman lidar BASIL

Comparisons between BASIL and LEANDRE 2 are also aimed at assessing the representativeness of ground-based lidar measurements, i.e., assess how representative are 1D profiles from ground-based lidars for a certain region in any given meteorological situation. In fact, vertical profiles from ground-based lidars are frequently used in dynamical models for assimilation or verification purposes; these models having a grid size much larger than the size of the sounded air masses, i.e., the portion of the air masses intercepted by the laser beam. However, this important aspect is only marginally addressed in the present manuscript.

3 Profiling sensors involved in the inter-comparison

3.1 Water vapor lidars

Three water vapor lidar systems participated to this inter-comparison effort: the ground-based Raman lidar BASIL, with water vapor, temperature, and particle backscatter profiling capability; the airborne DIAL LEANDRE 2, with water vapor and particle backscatter profiling capability; and the ground-based lidar WALI, with water vapor and particle backscatter profiling capability. WALI was temporarily deployed in Candillargues, in the proximity of BASIL (within 10-m distance), for a few hours on the evening of 30 October 2012.

BASIL makes use of a Nd:YAG laser source, equipped with second and third harmonic generation crystals, capable to emit pulses at 1064, 532, and 355 nm. Laser pulses at these three wavelengths are simultaneously transmitted in the atmosphere along the zenith. The shortest of these wavelengths (355 nm) is used to stimulate vibrational and rotational Raman scattering from water vapor, nitrogen, and oxygen molecules. These scattering processes, and the corresponding lidar echoes, are exploited to ultimately measure the vertical profiles of water vapor mixing ratio (Whiteman et al. 1992; Di Girolamo et al. 2016, Di Girolamo et al. 2018a) and temperature (Behrendt and Reichardt 2000; Di Girolamo et al. 2006). The use of the 355 nm laser wavelength maximizes lidar performance, as in fact Raman scattering cross-sections are proportional to the fourth negative power of the wavelength. For the specific operation of HyMeX-SOP1, the laser source was upgraded to generate a single pulse energy power at 355 nm of 500 mJ (corresponding to an average power of 10 W at a pulse repetition frequency of 20 Hz). The receiver is developed around a Newtonian telescope, with a primary mirror diameter of 0.45 m. The receiver also includes two small lenses, with a diameter of 0.05 m; which are used for the collection of the 532 and 1064 nm elastic signals. The key feature of BASIL is represented by its capability to perform daytime and nighttime high-resolution and accurate measurements of atmospheric temperature and water vapor based on the application of the rotational and vibrational Raman lidar technique, respectively (Di Girolamo et al. 2004, 2009a, 2009b, 2017; De Rosa et al. 2019), together with providing measurements of particle backscatter at 355, 532, and 1064 nm; particle extinction at 355 and 532 nm; and particle depolarization at 355 and 532 nm (Di Girolamo et al. 2009b, 2012a, 2012b, Girolamo et al. 2018b). The procedure applied to calibrate water vapor mixing ratio and temperature profile measurements by BASIL was carefully described in Di Girolamo et al. (2009a). During HyMeX-SOP1, BASIL was calibrated based on the comparison with radiosondes launched directly from the lidar site in Candillargues, with the launching facility being approximately 100–150 m away from the lidar system (Di Girolamo et al. 2016). More specifically, mean calibration coefficients were estimated by comparing BASIL with the radiosonde data at all times when both systems were simultaneously operated (for a total of approximately 50 comparisons). Because of the proximity of BASIL to the radiosonde launching facility, the comparisons needed to estimate the calibration coefficients were carried out in the vertical region from the surface up to approximately 2 km. Within this altitude interval, Raman lidar signals are strong and characterized by high signal-to-noise ratios and small statistical fluctuations. At the same time, within this altitude interval, the horizontal drift of the radiosonde with respect to the vertical of lidar station is limited, so that again the two sensors can be actually assumed to be sounding the same air masses. The water vapor mixing ratio calibration constant c is obtained through a best-fit procedure applied to the Raman lidar and radiosonde data, the value of the constant being determined by minimizing the root-mean-square deviation between the single data points from the two profiles within the altitude interval up to 2 km, exploiting a simple linear relationship linking the power ratio of the roto-vibrational Raman lidar signals from water vapor and nitrogen molecules, \( {P}_{{\mathrm{H}}_2\mathrm{O}}(z) \)/\( {P}_{{\mathrm{N}}_2}(z) \), to the water vapor mixing ratio profile, \( {x}_{{\mathrm{H}}_2\mathrm{O}}(z) \). As the Raman lidar and the radiosonde data have different altitude arrays, for the purpose of applying the best-fit algorithm, radiosonde data have been interpolated to the Raman lidar altitude levels. The mean value of the calibration constant \( \overline{c} \) is obtained by averaging the single calibration coefficient values from all 50 inter-comparisons. This value is then used for the calibration of all water vapor profile measurements by BASIL during HyMeX-SOP1. The use of the simultaneous radiosonde measurements for the determination of a mean calibration constant \( \overline{c} \) of BASIL, especially if limited to the lower 2 km, has marginal effects on the estimate of the single mutual bias and root-mean-square deviation profiles between BASIL and the radiosondes. The uncertainty affecting the calibration constant, σc, is estimated as the standard deviation of all single calibration values from the mean value. The value of \( \overline{c} \) is found to be equal to 85.21, while the value of σc is found to be equal to 3.22. The standard deviation, expressed in percentage, is 3.8%.

Considering an integration time of 5 min and a vertical resolution of 150 m, the random uncertainty affecting daytime water vapor mixing ratio measurements is found to be smaller than 0.2 g kg−1 (or 20%) up to 3 km and smaller than 0.3 g kg−1 (or 50%) up to 5 km, while the random uncertainty affecting daytime temperature measurements is not exceeding 0.4 K up to 3 km and 1 K up to 5 km. Nighttime performance is characterized by much smaller uncertainties, with a random error affecting vapor mixing ratio measurements not exceeding 0.04 g kg−1 (or 2%) at 3 km and of 0.002 g kg−1 (or 10%) at 9 km and a random error affecting temperature measurements smaller than 0.4 K at 3 km and of 1 K at 6.5 km.

LEANDRE 2 makes use of a flash-lamp-pumped Alexandrite laser whose emission wavelengths can be tuned upon a water vapor absorption line selected within the spectral range 727–770 nm. During HyMeX-SOP1, LEANDRE 2 was operated mainly on two weak water vapor absorption bands (at 730.4 and 731.2 nm); this wavelength selection is optimal for water vapor measurements in the presence of moderate integrated water vapor contents. The receiver consists of a 30-cm aperture telescope, with a 3.5-mrad field of view, and a 1-nm filter bandwidth. More details concerning the DIAL design and signal processing procedure can be found in Bruneau et al. (2001a, b). During HyMeX-SOP1, LEANDRE 2 was operated both in zenith- and nadir-pointing mode.

The water vapor and aerosol lidar WALI (Chazette et al. 2014) is a transportable system, which was specifically designed to meet HyMeX-SOP1 scientific goals. During HyMeX-SOP1, this ground-based Raman lidar was deployed in Ciutadella (40° 00′ 00″ N and 3° 50′ 20″ E, Menorca, Balearic Islands), where it operated almost continuously over 40 days (19 September–28 October 2012). On 30 October 2012, during its transfer journey from Spain to Paris, the system stopped over the Cévennes-Vivarais site in Candillargues and was deployed close to BASIL, where it was operated for a few hours in the evening. The instrument setup, the calibration procedure, and the associated error assessment are described in Chazette et al. (2014, 2016). Besides water vapor mixing ratio, WALI can also provide aerosol backscattering, extinction, and depolarization measurements at 355 nm. The compact optical design of WALI enables accurate nighttime water vapor mixing ratio measurements between 0.3 and ~ 7 km, while daytime performances are substantially degraded, with a precision smaller than 10% up to ∼ 1 km, considering a temporal averaging interval of ∼ 1 h.

In this paper, comparisons between LEANDRE 2 and all other sensors were carried out in terms of water vapor mixing ratio \( {x}_{{\mathrm{H}}_2\mathrm{O}}(z) \). This parameter is a by-product of LEANDRE 2, whose water vapor measurements are expressed in terms of the water vapor number concentration \( {n}_{{\mathrm{H}}_2\mathrm{O}}(z) \). \( {x}_{{\mathrm{H}}_2\mathrm{O}}(z) \) can be calculated from \( {n}_{{\mathrm{H}}_2\mathrm{O}}(z) \) through the expression (Warnecke 1997):

$$ {x}_{{\mathrm{H}}_2\mathrm{O}}={\left(\frac{p}{m_{{\mathrm{H}}_2\mathrm{O}}{n}_{{\mathrm{H}}_2\mathrm{O}} RT}-1.6078\right)}^{-1} $$
(1)

with \( {m}_{{\mathrm{H}}_2\mathrm{O}} \) being the water molecular mass (18.01), R being the gas constant of dry air (0.287 J g−1 K−1), and p and T being the environmental pressure and temperature, respectively. LEANDRE 2 measurements of \( {n}_{{\mathrm{H}}_2\mathrm{O}}(z) \) are converted into \( {x}_{{\mathrm{H}}_2\mathrm{O}}(z) \) using temperature and pressure information from two in situ sensors onboard the aircraft manufactured by Rosemount (model PT102E2AL for temperature and model 1201-1221 for pressure); this procedure is applicable only during takeoff and landing phases in the proximity of the airports (Chazette et al. 2016). In all other cases, temperature and pressure information is taken from a reference atmospheric model (mid-latitude summer). The systematic uncertainty associated with the conversion of water vapor number concentration to mixing ratio is estimated to not exceed 0.02 g kg−1.

In previous inter-comparison efforts involving ground-based and airborne lidars, a cutoff distance between the footprint of the airborne system and ground-based station was selected. For example, Behrendt et al. (2007a, 2007b) used a cutoff distance of 20 km during IHOP 2002, as this investigation area (US Great Plains) was mostly flat, while Bhawar et al. (2011) used a cutoff distance of 10 km during COPS, as this took place in a mountain area (Black Forest and Rhine Valley), with a highly variable orography, resulting in a larger horizontal variability of the humidity field. No specific cutoff distance was considered in the present study, as in fact most comparisons between BASIL and LEANDRE 2 were possible during the specific ATR42 flights in the framework of the EUFAR Project “WaLiTemp,” with a typical distance between BASIL and the aircraft footprint of 6–8 km, and during takeoff and landing for all other flights, with a typical minimum distance in the range 20–40 km.

3.2 Microwave radiometer

During HyMeX-SOP1, a microwave radiometer (MWR) (model: HATPRO, manufactured by Radiometer Physics GmbH - RPG) was deployed in Candillargues (approximately 50 m away from the radiosonde launching facility and 150–200 m away from BASIL) from 11 October to 05 November 2012. Due to networking and data archiving problems, the MWR operated only 12 days during this period, i.e., October 11, 12, 15, 17, 21, 22, 25, 26, 29, 30, and 31 and November 5. The system is capable to retrieve vertical profiles of temperature and specific humidity from surface up to ~ 10 km, as well as integrated liquid water content. The MWR exploits channels in the spectral interval 20–60 GHz, with channels in the 20–30 GHz region used to retrieve the humidity profiles and channels in the 50–60 GHz region used to retrieve the temperature profiles. Atmospheric temperature and specific humidity profiles are inferred from the measured brightness temperatures through the solution of inverse problems, based on the use of a priori information from a radiosonde climatology (Cimini et al. 2006). The accuracy of microwave radiometer temperature profile measurements, expressed in terms of the RMS deviation between consecutive measurements, is specified by the manufacturer to be 0.25 K in the altitude interval 0–0.5 km, 0.5 K in the altitude interval 0.5–1.2 km, 0.75 K in the altitude interval 1.2–4 km, and 1 K in the altitude interval 4–10 km, while the accuracy of absolute humidity/water vapor mixing ratio profile measurements by this same sensor is specified to be 0.1 g m−3/0.08 g kg−1 throughout the troposphere. Microwave radiometer data are processed and made available at 93 altitude levels up to 10 km, with a vertical resolution of 25 m up to 0.1 km, 30 m in the altitude interval 0.1–0.5 km, 40 m in the interval 0.5–1.2 km, 60 m in the interval 1.2–1.8 km, 90 m in the interval 1.8–2.5 km, 120 m in the interval 2.5–3.5 km, 160 m in the interval 3.5–4.5 km, 200 m in the interval 4.5–6 km, and 300 m in the interval range 6–10 km.

3.3 Thermodynamic sensors onboard the ATR42

As mentioned above, in addition to LEANDRE 2, the ATR42 was hosting thermodynamic (temperature/pressure/humidity) probes, the temperature and humidity probes being considered in the present inter-comparison effort. The humidity sensors are a hygrometer, developed by LI-COR (model: 7500A/RS), and a capacitive probe (Humicap-H, Vaisala, with an accuracy of ± 0.8% or 0.05–0.07 g kg−1). The temperature sensor is a platinum resistance wire probe, developed by Rosemount (model: PT102E2AL, with an accuracy of ± 0.10 °C). The vertical resolution of the measurements is dependent on the aircraft ascent speed. Considering an aircraft speed of 6 m s−1, which is a typical value during the ascent (spiraling-up) and descent (spiraling-down) of the aircraft, the vertical resolution of the airborne in situ sensors is 3–5 m.

3.4 Radiosoundings

Radiosonde humidity and temperature profiles remain a primary source of information for numerical weather prediction model assimilation systems (Ingleby and Edwards 2015). During HyMeX-SOP1, a radiosonde launching facility was setup in Candillargues. Launched radiosondes were manufactured by Vaisala (model: RS92). Radiosondes were launched without a predefined schedule, primarily during the intensive observation periods, with a launching rate of up to one launch every 1.5 h. These sondes provide vertical profiles of atmospheric pressure, temperature, humidity, and wind (direction and speed) during both the ascent and descent phases. The thin-wire temperature sensor, characterized by a very fast response time, is protected with a hydrophobic coating to reduce the effects of evaporative cooling after emerging from clouds, while two thin-film polymer humidity sensors are present, which are pulse heated to prevent them from freezing and undergo a reconditioning process before launch aimed at minimizing accuracy degradation effects associated with the accumulation of chemical contaminants. The uncertainty affecting temperature measurements is specified by the manufacturer to not exceed 0.3 °C up to 16 km both in day and night, while the uncertainty affecting humidity measurements, expressed in terms of RH, is specified to not exceed 3% for temperatures higher than − 40 °C both in day and night, which corresponds to 0.20–0.25 g kg−1 when translated into water vapor mixing ratio uncertainty. Pressure measurements, which are carried out with a silicon pressure sensor, have an accuracy in the range 0.04–0.1 hPa at pressure levels below 100 hPa (~ 16 km).

The vertical resolution of radiosonde measurements is dependent on the sonde vertical ascent speed, the response time of each specific sensor, and the measurement cycle (1 s). For temperature measurements, the response time of the thin-wire sensor is smaller than 0.4 s at 1000 hPa (surface level), smaller than 1 s at the 100 hPa pressure level (~ 16 km), and smaller than 2.5 s at the 10 hPa pressure level (~ 30 km). For humidity measurements, the response time of the thin-film polymer humidity sensors is smaller than 0.5 s at a temperature of + 20 °C, while is smaller than 20 s at a temperature of − 40 °C. Considering a radiosonde vertical ascent speed of 5 m s−1, the maximum vertical resolution of temperature measurements is 5–12.5 m, while the maximum vertical resolution of humidity measurements is 5–100 m. Radiosonde data considered in this study were processed and made available at 49 altitude levels up to 13 km (Bock et al. 2016).

4 Statistical quantities considered in the inter-comparison: definition of bias and root-mean-square deviation

For the purpose of performing an appropriate error analysis of the different sensors involved in the inter-comparison effort, simultaneous and co-located data from the different sensors’ pairs are considered. These are used to compute the percentage mutual bias and root-mean-square deviation between the two sensors as a function of altitude.

The bias identifies an offset between the two sensors which may be caused by different sources of systematic error affecting the two measurements, primarily associated with the considered calibration procedures or the different instrumental characteristics of the compared sensors. Thus, the bias quantifies the relative accuracy of the compared instruments. The bias affecting water profiling sensors may have a height-dependent nature. For example, the bias affecting water vapor and temperature Raman lidar measurements has a height-dependent component (e.g., height-dependent effects on the atmospheric differential transmission at the two wavelengths considered for either water vapor mixing ratio or temperature measurements) or with their instrumental characteristics (e.g., different overlap functions of Raman lidar channels, slightly nonlinear behavior of the detectors or acquisition systems). In contrast, the root-mean-square (RMS) deviation includes all possible differences between the profile measurements from the two compared instruments. It includes differences associated with both systematic and statistical uncertainties affecting the two instruments and the natural differences of the measured atmospheric parameter (either water vapor mixing ratio or temperature) in the different air masses sounded by the two sensors.

Expressions to compute the mean value of these quantities are the following (Behrendt et al. 2007a, 2007b; Bhawar et al. 2011):

$$ \mathrm{BIAS}\left({z}_1,{z}_2\right)=\frac{1}{N}\sum \limits_{i=1}^N{\mathrm{BIAS}}_i=\frac{2}{N}\sum \limits_{i=1}^N\left\{\frac{\sum \limits_{z_{\mathrm{k}}={z}_1}^{z_2}\left[{q}_{1,i}\left({z}_{\mathrm{k}}\right)-{q}_{2,i}\left({z}_{\mathrm{k}}\right)\right]}{\sum \limits_{z_{\mathrm{k}}={z}_1}^{z_2}\left[{q}_{1,i}\left({z}_{\mathrm{k}}\right)+{q}_{2,i}\left({z}_{\mathrm{k}}\right)\right]}\right\} $$
(2)
$$ \mathrm{RMS}\left({z}_1,{z}_2\right)=\frac{1}{N}\sum \limits_{i=1}^N{\mathrm{RMS}}_i=\frac{2}{N}\sum \limits_{i=1}^N\left\{\frac{\sqrt{N_{\mathrm{z}}\sum \limits_{z_{\mathrm{k}}={z}_1}^{z_2}{\left[{q}_{1,i}\left({z}_{\mathrm{k}}\right)-{q}_{2,i}\left({z}_{\mathrm{k}}\right)\right]}^2}}{\sum \limits_{z_{\mathrm{k}}={z}_1}^{z_2}\left[{q}_{1,i}\left({z}_{\mathrm{k}}\right)+{q}_{2,i}\left({z}_{\mathrm{k}}\right)\right]}\right\} $$

where q1,i(zk) and q2,i(zk) represent the water vapor mixing ratio or temperature values at altitude zk for sensor 1 and sensor 2, respectively, z1 and z2 are the lower and upper level of the considered altitude interval, respectively, and Nz is the number of data points for each sensor in this interval. As also in Behrendt et al. (2007a, 2007b) and in Bhawar et al. (2011), all inter-comparisons reported in this paper consider a vertical window width ∆z = z2-z1 of 500 m. In the above expression for BIAS, we suppose that the summation over zk cancels out the random variations of q1 and q2. The number of vertical windows considered in each computation depends on the vertical coverage of the two inter-compared sensors. For example, when comparing BASIL and LEANDRE 2 data, as only LEANDRE 2 nadir data are considered, the comparison between these two sensors can be extended from surface up to 3000 m, which leads to estimates of the bias and RMS deviation profiles with 6 different levels (0–500 m, 500–1000 m, 1000–1500 m, 1500–2000 m, 2000–2500 m, and 2500–3000 m). As an additional example, when comparing BASIL and radiosonde or in situ data, the comparison can be extended up to 6 km, which leads to estimates of the bias and RMS deviation profiles with 12 different levels (0–0.5 km, 0.5–1 km, 1–1.5 km, 1.5–2 km, 2–2.5 km and 2.5–3 km, 3–3.5 km, 3.5–4 km, 4–4.5 km, 4.5–5 km, 5–5.5 km, and 5.5–6 km). For most sensors’ pairs, the inter-comparison range is between 0.5 and 6 km.

In the expressions above, we are considering as the reference value the mean of the two sensors measurements instead of using the measurement result from one of the two. This approach leads to more objective results than considering one of the sensors as reference (Behrendt et al. (2007a, b, Bhawar et al. 2011). The approach of considering the mean of the two sensors as reference, i.e., attributing equal weight to the data reliability of each sensor, comes from the assumption that none of the two can a priori be assumed more accurate than the other and thus by the assumption that the closest to a reference profile can be obtained by taking the mean of the two profiles.

The index i, with values in the range from 1 to N, identifies a specific inter-comparison sample, with N being the total number of possible comparisons for each sensors' pair. Thus, profiles of mean bias and RMS deviation between the two sensors can be finally computed considering the total number N of possible inter-comparisons for each sensors’ pair. To apply expressions (2), a common altitude array has to be considered for each sensors’ pair, with data from one sensor being interpolated to other sensor altitude levels.

Expressions (2) allows computing the percentage bias and RMS deviation. The bias and RMS deviation are obtained from expressions (2) by multiplying these terms by the mean of the two sensors’ data, i.e., the quantity \( 2\sum \limits_{z_{\mathrm{k}}={z}_1}^{z_2}\left\{{q}_{1,i}\left({z}_{\mathrm{k}}\right)+{q}_{1,i}\left({z}_{\mathrm{k}}\right)\right\}/{N}_{\mathrm{z}} \).

From the mean bias and RMS deviation profiles, the vertically averaged mean bias, \( \overline{\mathrm{bias}} \), and RMS deviation, \( \overline{\mathrm{RMS}} \), is finally estimated (Bhawar et al. 2011); this is given by the expressions:

$$ \overline{\mathrm{bias}}=\sum \limits_{j=1}^{N_{\mathrm{t}}}{w}_j\left({\mathrm{bias}}_j\right)/\sum \limits_{j=1}^{N_{\mathrm{t}}}{w}_j $$
(3)
$$ \overline{\mathrm{RMS}}=\sum \limits_{j=1}^{N_t}{w}_j\left({\mathrm{RMS}}_j\right)/\sum \limits_{j=1}^{N_{\mathrm{t}}}{w}_j $$

with biasj/RMSj denoting the bias or RMS deviation value at level j, wj being the weight attributed to this value, and Nl being the number of vertical windows (each having a vertical extent of 500 m).

The weight wj is given by the number of inter-comparisons possible in the jth vertical window and it varies from zero (minimum weight) to the total number N (maximum weight) of possible inter-comparisons between the two sensors. The use of a weighted mean is particularly needed in those inter-comparisons where the vertical coverage of the compared sensors may vary from one case study to the next. A weighted mean is necessary because, in case of missing data at some specific altitudes, the number of inter-comparisons is smaller than N, and thus, data from these altitudes must have a lower weight in the vertically averaged mean.

The vertical interval considered in the computation of the vertically averaged bias and RMS deviation may vary for the different sensor pairs in dependence of the vertical coverage of the compared sensors, the selection being driven by the sensor having the lower coverage. For example, when comparing BASIL and LEANDRE 2 data, as only nadir LEANDRE 2 profiles are considered, the comparison between these two sensors can be extended from surface level up to 3 km and thus Nt is equal to 6, this being the number of 500-m-width vertical windows up to 3 km (0–0.5 km, 0.5–1 km, 1–1.5 km, 1.5–2 km, 2–2.5 km, and 2.5–3 km); when comparing BASIL and the radiosonde or the in situ sensors, the comparison can be extended up to 6 km and thus Nt is equal to 12, this being the number of 500-m-width vertical windows up to 6 km.

It is to be specified that values of \( \overline{\mathrm{bias}} \) are typically smaller than the single mean bias values at different altitudes. This is because, in the vertically averaged bias, positive and negative values at the different altitudes average out. In order to overcome this issue and provide a vertically averaged quantity having values comparable with the single bias values at the different altitudes, the vertically averaged absolute mean bias, \( \left|\overline{\mathrm{bias}}\right| \), is introduced. \( \left|\overline{\mathrm{bias}}\right| \), defined as the weighted mean of the moduli of the single bias values at different altitudes, can be determined through the expression:

$$ \left|\overline{\mathrm{bias}}\right|=\sum \limits_{j=1}^{N_{\mathrm{t}}}{w}_j\left|{\mathrm{bias}}_j\right|/\sum \limits_{j=1}^{N_{\mathrm{t}}}{w}_j $$
(4)

In this expression, bias values with different signs at different altitudes will not cancel out. Consequently, values of \( \left|\overline{\mathrm{bias}}\right| \) are typically larger than the corresponding \( \overline{\mathrm{bias}} \) values. Values of \( \left|\overline{\mathrm{bias}}\right| \) and \( \overline{\mathrm{bias}} \) are found to be coincident in the case one of the two sensors shows at all altitudes values systematically larger (or smaller) than the other sensor, which is an event taking place for specific sensors’ pairs included in the present inter-comparison effort.

The knowledge of the mutual bias between different sensors’ pairs ultimately allows getting an estimate of the overall bias affecting all considered sensors (De Rosa et al. 2019). However, in order for this to be possible, one of the sensors involved in the inter-comparison effort must have simultaneous and co-located data with all other sensors. For the present inter-comparison effort, this condition was verified by the Raman lidar system BASIL. The overall bias estimate relies on the assumption that performances in terms of bias of all involved sensors do not change (degrade or improve, i.e., the bias keeps constant) throughout the duration of the inter-comparison effort. This approach assumes equal weight on the data reliability of each sensor. The estimate of the overall bias affecting each sensor is obtained by imposing that the summation of all mutual biases between all sensor pairs is equal to zero. More details on the estimate of the overall bias affecting all sensors will be given in the “Overall bias affecting all sensors” section.

5 Results

5.1 Comparisons between BASIL, LEANDRE 2, radiosondes, and the in situ sensors

Direct comparisons between BASIL, LEANDRE 2, radiosondes, and the in situ sensor were possible only in two specific case studies: 13 September and 2 October 2012, which are the dates of the first two ATR42 flights dedicated to the EUFAR-WaLiTemp Project. At this time of the field deployment, the MWR was not yet operational. However, on these 2 days, comparisons in terms of water vapor mixing ratio were possible between BASIL, LEANDRE 2, radiosondes, and the in situ sensors, while comparisons in terms of temperature were possible between BASIL, radiosondes, and the in situ sensors. When comparing with BASIL, in order to minimize effects associated with the sampling of different air masses, LEANDRE 2 data are averaged over a time period of 10 s, corresponding to a horizontal integration length of ∼ 1 km.

Figure 2 illustrates the time evolution of the water vapor mixing ratio as measured by BASIL over a ∼ 5-h time period from 17:46 to 22:58 UTC on 13 September 2012 (IOP 3). The figure is generated using a succession of 156 consecutive profiles (integration time ∆t = 2 min, vertical resolution ∆z = 150 m). This figure reveals the presence of a humid convective boundary layer (with \( {x}_{{\mathrm{H}}_2\mathrm{O}} \)values up to 6–8 g kg−1) and a number of humid layers above it, with a more marked and thicker layer in the altitude region 4.5–6 km (with \( {x}_{{\mathrm{H}}_2\mathrm{O}} \)values of 2–3 g kg−1). The figure also highlights the day-to-night transition, with the final part of the daytime portion of the day clearly distinguishable on the left side of the figure, with noisy data above 4–5 km due to enhanced solar background and incomplete rejection by the interference filters.

Fig. 2
figure 2

Time evolution of the water vapor mixing ratio (g kg−1) as measured by BASIL over the ∼ 5-h time period from 17:46 to 22:58 UTC on 13 September 2012. The red vertical line identifies the time when the comparison between the different sensors was carried out

During this measurement session, water vapor mixing ratio profiles measured by a radiosonde (launch at 20:30 UTC) and the in situ sensor (hygrometer) onboard the ATR42 were available, in addition to those measured by BASIL and LEANDRE 2. Profiles from all these sensors were compared (the comparison time being identified by the red vertical line in Fig. 2) and are illustrated in Fig. 3. It is to be specified that, because of a temporary laser failure, the operation of LEANDRE 2 was stopped after ∼ 1 h, so that the only available water vapor mixing ratio profile from this sensor during the flight on 13 September 2012 was collected in the time interval 20:20–20:23 UTC. The profile from BASIL is illustrated in Fig. 3 with two different integration times: 20:20–20:23 UTC, this interval being coincident with LEANDRE 2 measurement, and 20:00–20:30 UTC, this interval embracing most part of the time window when the in situ humidity sensor was collecting data during the ascent (spiraling-up) and descent (spiraling-down) of the aircraft. Figure 3 reveals that the agreement between the different sensors is quite good throughout the vertical extent of the measurements. Figure 4 illustrates the vertical profiles of bias and RMS deviation for the different sensors’ pairs. The figure reveals that the bias between BASIL, LEANDRE 2, and the radiosonde is not exceeding ± 1.0 g kg−1 or ± 20% up to 5 km, while the bias between these three sensors and the in situ humidity sensor is not exceeding ± 0.6 g kg−1 or ± 20% within the boundary layer, but increases up to ± 1.5 g kg−1 or ± 100% in the free troposphere. As expected, RMS deviation values are always larger than bias values at any altitude as in fact the former quantity includes all possible differences between the two sensors associated with both systematic and statistical uncertainties and the natural differences in the water vapor content of the sounded air masses (Bhawar et al. 2011).

Fig. 3
figure 3

Water vapor mixing ratio profiles from BASIL on 13 September 2012 (with two integration times: 20:20–20:23 UTC and 20:00–20:30 UTC), LEANDRE 2 (20:20–20:23 UTC), a radiosonde (launched at 20:30 UTC), and the in situ hygrometer (with two integration times: 19:56–20:35 UTC and 20:35–20:49 UTC)

Fig. 4
figure 4

Water vapor mixing ratio percentage BIAS and BIAS (panel a and c) and percentage RMS deviation and RMS deviation (panel b and d) for the different sensors’ pairs compared on 13 September 2012. For the purpose of this comparison, for BASIL, we considered the integration time 20:20–20:23 UTC, while for the in situ hygrometer, we considered the integration time 19:56–20:35 UTC

Figure 5 illustrates the temperature profiles measured by BASIL, the radiosonde (launched at 20:30 UTC), and the in situ temperature sensor onboard the ATR42, considering the same time intervals selected for Fig. 3. All three sensors properly capture the temperature inversion at the top of the planetary boundary layer (PBL); this inversion appears however more marked and steeper in the radiosonde profile. The temperature inversion is observed by the three sensors at slightly different heights, at 1.5 km by the radiosonde, at 1.7 km by BASIL, and at 1.9 km by the in situ sensors. Slight differences in the amplitude and height of the temperature inversion are the result of the slightly different air masses sounded by the three sensors. In this respect, it is to be pointed out that the radiosonde launched from the same site where the Raman lidar is located, experienced during its ascent a horizontal drift of ∼ 5 km. The ascent spiral of the ATR42 was centered at a distance of 8.2 km from the lidar station. Temperature fluctuations in the Raman lidar profile above 3 km are the result of the large statistical fluctuations affecting its signals at those heights.

Fig. 5
figure 5

Temperature profiles from BASIL on 13 September 2012 (20:00–20:30 UTC), a radiosonde (launched at 20:30 UTC), and the in situ airborne thermistor (integration time: 19:56–20:35 UTC)

Figure 6 illustrates the vertical profiles of temperature bias and RMS deviation for the different sensors’ pairs. The figure reveals mutual bias values between all sensors not exceeding ± 1.5 K, with the only exception of the bias value of the radiosonde vs. both BASIL and the in situ sensor at the top of the PBL (2 km), where the marked temperature inversion is located. Above 3 km, smaller bias values are found between the radiosonde and the in situ sensor (around 0.5 K up to 5 km) with respect to those of BASIL vs. both the radiosonde and the in situ sensor (up to ± 1.5 K); this is most probably associated with the large statistical uncertainty affecting temperature measurements by BASIL above 3 km. RMS deviation values between all sensors do not exceed ± 2.5 K, again with the only exception of the value observed at the top of the PBL.

Fig. 6
figure 6

Temperature BIAS (panel a) and RMS deviation (panel b) for the different sensors’ pairs compared on 13 September 2012

Figure 7 illustrates the time evolution of the water vapor mixing ratio as measured by BASIL over a ∼ 7.5-h time period from 16:21 to 23:55 UTC on 02 October 2012 (IOP 9), generated as a succession of 227 consecutive profiles (∆t = 2 min, ∆z = 150 m). This figure reveals the presence of a wet convective boundary layer extending up to ∼ 1.2 km, with values of \( {x}_{{\mathrm{H}}_2\mathrm{O}} \) up to 8–10 g kg−1, a second overlaying humid layer up to 3.5 km, with values of \( {x}_{{\mathrm{H}}_2\mathrm{O}} \) in the range 5–6 g kg−1, and an upper humid layer extending up to 5.5 km, with values of \( {x}_{{\mathrm{H}}_2\mathrm{O}} \) up to 3 g kg−1. Once again, the day-to-night transition is visible in the first part of the measurement session. Comparisons between BASIL, LEANDRE 2, radiosondes, and the in situ humidity sensors, illustrated in Fig. 8, were carried out during two time intervals centered at 19:44 and 21:06 UTC; these are the launching times of the radiosondes during this flight (red vertical lines in Fig. 7). More specifically, the integration time for BASIL and the in situ thermistor centered around the earlier radiosonde launch was 19:44–19:54 UTC and 19:42–20:24 UTC, respectively; the integration time for BASIL and the in situ thermistor for the later radiosonde launch was 21:06–21:16 UTC and 20:24–22:12 UTC. Again, the agreement between the different sensors is quite good throughout the vertical extent of their measurements, with deviations between BASIL, LEANDRE 2, and the radiosonde not exceeding 0.5 g kg−1 at all altitudes. Figures 9 and 10 illustrate the vertical profiles of bias and RMS deviation for the different sensors’ pairs (comparisons centered at 19:44 UTC are illustrated in Fig. 9, while comparisons centered at 21:06 UTC are considered in Fig. 10). The bias between BASIL, LEANDRE 2, and the radiosonde is not exceeding ± 0.5 g kg−1 or ± 10% up to 2.5 km, while the bias between these three sensors and the in situ humidity sensor is as large as ± 2 g kg−1 or ± 80% up to 5 km. Figure 11 illustrates the temperature profiles for this same case study as measured by BASIL, the radiosonde (launches at 19:44 and 21:06 UTC), and the in situ temperature sensor, considering the same time intervals selected for Fig. 8. BASIL well reproduces the vertical variability of the temperature profile, with mutual deviations from the radiosonde and the in situ sensor not exceeding ± 1 K above the PBL up to 5 km. Figure 12 illustrates the vertical profiles of temperature bias and RMS deviation for the different sensors’ pairs, with the upper two panels (a and b) covering the time interval centered around the earlier radiosonde launch and the lower two panels (c and d) covering the time interval centered around the later radiosonde launch. The figure reveals that, in both cases, the mutual bias between the radiosonde and the in situ sensor is not exceeding 1 K up to 5 km, being in the range 0–1 K for the earlier case and in the range ± 0.3 K for the later case. The mutual bias between BASIL and both the radiosonde and the in situ sensor is not exceeding ± 1 K up to 5 km, with the only exception of the value at the top of the PBL. RMS values are smaller than 2 K up to 5 km for all sensors’ pairs.

Fig. 7
figure 7

Time evolution of the water vapor mixing ratio (g kg−1) as measured by BASIL over the ∼ 7.5-h time period from 16:21 to 23:55 UTC on 02 October 2012. The red vertical lines identify the time when the comparison between the different sensors was carried out

Fig. 8
figure 8

Water vapor mixing ratio profiles on 02 October 2012 from BASIL, LEANDRE 2, radiosondes, and the in situ humidity sensor. Panel a BASIL (with two integration times: 19:44–19:47 UTC and 19:44–19:54 UTC), LEANDRE 2 (20:58–21:01 UTC), radiosonde (launched at 19:44 UTC), and the in situ hygrometer (with two integration times: 19:42–20:24 UTC and 20:24–22:12 UTC). Panel b BASIL (with two integration times: 21:06–21:09 UTC and 21:06–21:16 UTC), LEANDRE 2 (21:06–21:09 UTC), radiosonde (launched at 21:06 UTC), and the in situ hygrometer (with two integration times: 19:42–20:24 UTC and 20:24–22:12 UTC)

Fig. 9
figure 9

BIAS (panels a and c) and RMS deviation (panel b and d) for the different sensors’ pairs at 19:44 UTC on 02 October 2012. For the purpose of the comparisons in panels ad, for BASIL, we considered the integration time 19:44–19:47 UTC, while for the in situ hygrometer, we considered the integration time 19:42–20:24 UTC

Fig. 10
figure 10

BIAS (panels a and c) and RMS deviation (panel b and d) for the different sensors’ pairs at 21:06 UTC on 02 October 2012. For the purpose of the comparisons in panels ad, for BASIL, we considered the integration time 21:06–21:09 UTC, while for the in situ hygrometer, we considered the integration time 20:24–22.12 UTC

Fig. 11
figure 11

Temperature profiles on 02 October 2012 from BASIL, a radiosonde, and the in situ airborne thermistor. Panel a BASIL (integration time: 19:44–19:54 UTC), radiosonde (launched at 19:44 UTC), and in situ thermistor (integration time: 19:42–20:24 UTC). Panel b BASIL (integration time: 21:06–21:16 UTC), radiosonde (launched at 21:06 UTC), and the in situ thermistor (integration time: 20:24–22:12 UTC)

Fig. 12
figure 12

Temperature BIAS (panels a and c) and RMS deviation (panels b and d) for the different sensors’ pairs compared on 02 October 2012. Panels a and b BASIL (integration time: 19:44–19:54 UTC), radiosonde (launched at 19:44 UTC), and in situ airborne thermistor (integration time: 19:42–20:24 UTC). Panels c and d BASIL (integration time: 21:06–21:16 UTC), radiosonde (launched at 21:06 UTC), and the in situ airborne thermistor (integration time: 20:24–22:12 UTC)

5.2 BASIL vs. LEANDRE 2

The eight flight hours allotted to the EUFAR-WaLiTemp Project were distributed over four distinct flights: flight AS38 on 13 September 2012 (with a duration of 1.2 h), flight AS42 on 02 October 2012 (with a duration of 2.7 h), flight AS57 on 29 October 2012 (with a duration of 2.7 h), and flight AS62 on 5 November 2012 (with 1.2 h dedicated to the project within an overall flight duration of 2.6 h). On 13 September 2012, LEANDRE 2 experienced a laser failure, which determined an early end of the flight. LEANDRE 2 was not operated on the flights on 29 October and 5 November 2012. Inter-comparisons between BASIL and LEANDRE 2 were successfully carried out during the initial portion of the flight on 13 September 2012 and throughout the duration of the flight on 02 October 2012. However, inter-comparisons between BASIL and the in situ sensors onboard the ATR42 were possible also on 29 October and 5 November 2012.

In addition to the specific comparisons on 13 September and 02 October 2012 during the dedicated EUFAR-WaLiTemp flights, when all sensors but the MWR were operational, other comparisons were possible in a variety of additional flights. In fact, data from LEANDRE 2 and the in situ sensors collected during an additional 12 HyMeX flights could also be included in the inter-comparison effort. These flights took place in 10 different days (11, 23, 26, and 28 September and 11, 12, 14, 15, 18, and 20 October 2012). On these days, comparisons between LEANDRE 2, BASIL, radiosondes, and the in situ humidity sensor were possible during the aircraft ascending and descending phases in the proximity of Montpellier airport, following takeoff and landing. Due to eye safety issues, LEANDRE 2 acquired nadir profiles only when the ATR42 was flying above an altitude of 2 km. In this respect, it is to be specified that LEANDRE 2 started/ended collecting measurements typically 10 min after takeoff/before landing, while in situ sensors onboard the aircraft were already operational during the takeoff and landing phases. With the only exception of 11 September and 02 October 2012, when the ATR42 flew over Candillargues, LEANDRE 2 water vapor mixing ratio profiles were acquired at distances typically of 20–40 km from the ground-based lidar station (Di Girolamo et al. 2016). Inter-comparisons between LEANDRE 2 and BASIL were carried out in 26 cases, which are listed in Table 1. This table also lists the minimum distance between BASIL and LEANDRE 2 footprint for each inter-comparison, together with the vertically averaged weighted mean values of bias and RMS deviation, as well as their percentage values. Results in Table 1 reveal that bias values are in the range − 1.56–1.16 g kg−1 (or − 22.90–16.87%), with larger values observed for those inter-comparisons characterized by a larger distance between BASIL and LEANDRE 2 footprint. RMS deviation values are larger than corresponding bias values (in the range 0.6–3.2 g kg−1 or 8.10–45.20%), because the former quantity includes all possible differences between the two sensors. Figure 13 illustrates the variability of the bias and RMS deviation as a function of the minimum distance between BASIL and LEANDRE 2 footprint. The figure reveals an increasing trend of the bias with increasing distance between the two sensors, while no clear trend is visible in the RMS deviations. An increasing bias with distance is possibly testifying the effect associated with differences in the sounded air masses, while a missing variability of the RMS deviation with distance may be due to the statistical uncertainties affecting water vapor mixing ratio measurements from the two lidars prevailing over the water vapor differences between the two sounded air masses. The mean bias value of LEANDRE 2 vs. BASIL is − 0.08 g kg−1 (or − 2.50%), while the RMS deviation value is 1.48 g kg−1 (or 22.90%).

Table 1 Inter-comparisons between BASIL and LEANDRE 2. The table includes the dates and times of all possible inter-comparisons, the minimum distance between BASIL and LEANDRE 2 footprint for each inter-comparison, together with the vertically averaged weighted mean values of bias and RMS deviation, as well as their percentage values, computed over the inter-comparison range between 0.5 and 3 km
Fig. 13
figure 13

Vertically-averaged mean BIAS and RMS deviation between BASIL and LEANDRE 2 as a function of the minimum distance between the two sensors

5.3 BASIL vs. MWR

BASIL and the microwave radiometer were operated throughout the period 11 October–05 November 2012 at a distance of 150–200 m one from the other. During this period, 348 nighttime clear sky occurrences were identified. These profiles are based on an integration time for both BASIL and the MWR of 5 min.

Figure 14 illustrates the mean bias and RMS deviation profiles for water vapor mixing ratio (panels a and b) and temperature (panel c) obtained using all 348 profile-to-profile inter-comparisons. The analysis is extended up to 7 km for water vapor mixing ratio measurements and up to 5.5 km for temperature measurements. Except for the lowest 200 m, bias values for water vapor mixing ratio measurements are within ± 0.4 g kg−1 (or ± 20%), while RMS deviation values are always smaller than 1 g kg−1 (or 55%). Bias values for temperature measurements are within 0–1.5 K, while RMS deviation values are always smaller than 2.5 K. The largest temperature deviation values are found in the PBL.

Fig. 14
figure 14

Mean BIAS and RMS deviation profiles for BASIL vs. the MWR in terms of water vapor mixing ratio (panels a and b) and temperature (panel c)

5.4 MWR vs. radiosondes and in situ sensors

A far less extensive inter-comparison analysis, with a much smaller number of cases, could be carried out between the MWR and the other sensors considered in this study. More specifically, 42 profile-to-profile comparisons were considered between the MWR and the radiosondes and 14 between the MWR and the in situ sensors. Figure 15 illustrates the mean bias and RMS deviation profiles of the MWR vs. the radiosondes for water vapor mixing ratio (panels a and b) and temperature (panel c), while Fig. 16 illustrates the mean bias and RMS deviation profiles of the MWR vs. the in situ sensor for water vapor mixing ratio (panels a and b) and temperature (panel c). Bias values of the MWR vs. the radiosondes in terms of water vapor mixing ratio measurements are within ± 0.5 g kg−1, with the only exception of the value at the top of the PBL (− 1 g kg−1), while RMS deviation values are smaller than 1.5 g kg−1. Bias values between these two sensors in terms of temperature measurements are smaller than 1 K up to 5.5 km, with the only exception of the height interval 3–4 km where bias values are larger, but not exceeding 1.5 K. RMS deviation values are not exceeding 1.5 K at any height. For what concerns the comparison of the MWR vs. the in situ sensor in terms of water vapor measurements, bias values are within ± 1 g kg−1, while RMS deviation values are always smaller than 1.5 g kg−1. Bias values between these two sensors in terms of temperature measurements are smaller than 1 K up to 5.5 km, while RMS deviation values are not exceeding 1.2 K at any height.

Fig. 15
figure 15

Mean BIAS and RMS deviation profiles for the radiosondes vs. the MWR in terms of water vapor mixing ratio (panels a and b) and temperature (panel c)

Fig. 16
figure 16

Mean BIAS and RMS deviation profiles of the in situ sensor vs. the MWR for water vapor mixing ratio (panels a and b) and temperature (panel c)

5.5 BASIL vs. WALI

WALI operated close to BASIL over a time period of ~ 4.5 h of 16:49 to 21:35 UTC on 30 October 2012. For the purpose of this comparison, WALI data were calibrated based on the use of the data from the radiosonde launched on-site at 21:00 UTC, considering the vertical interval 0.2–0.7 km for this purpose. Figure 17 illustrates the time evolution of the water vapor mixing ratio as measured by WALI (upper panel) and BASIL (lower panel) over the ~ 4.5-h time period specified above. WALI’s time-height cross-section is obtained as a succession of ∼ 200 consecutive (with a few gaps) 1-min averaged profiles, with a vertical resolution of 15 m, while BASIL’s time-height cross-section is obtained as a succession of ∼ 300 consecutive 1-min averaged profiles, again with a vertical resolution of 15 m. Both panels show water vapor data only up to a height of 1.3–1.8 km because of the presence of a thick cloud layer above, with the only exception of the time intervals 17:10–17:45 UTC and 21:00–21:30 UTC. WALI experienced a brief temporary stop over the time interval 19:15–20:40 UTC. Both figure panels reveal the variability of the water vapor field during the measurement interval, with two overlaying humid layers: a lower layer with a progressively growing vertical extent (from ∼ 0.7 km at 17:30 UTC to ∼ 1.2 km at 21:30 UTC) and progressively growing mixing ratio values (from ∼ 5.5 g kg−1 at 17:30 UTC to ∼ 7.5 km at 21:30 UTC) and an upper layer in the vertical interval 1.0–1.3 km, with progressively diminishing mixing ratio values during the observation period. Both systems appear to be capable to capture the fine structure of the two different humid layers and their detail variability, with high time and vertical resolution.

Fig. 17
figure 17

Time evolution of the water vapor mixing ratio as measured by BASIL (lower panel) and WALI (upper panel) over the ∼ 5-h time period from 16:49 to 21:35 UTC on 30 October 2012

Figure 18 illustrates the water vapor mixing ratio profiles measured by BASIL and WALI in the time interval 21:00–21:15 UTC, together with the profile from the radiosonde launched at 21:00 UTC. The comparison is illustrated up to an altitude of 4 km. The agreement between the two lidar is quite good, with deviations between the two not exceeding 0.5 g kg−1 at any altitude.

Fig. 18
figure 18

Water vapor mixing ratio profiles from BASIL, WALI (integration time: 21:00–21:15 UTC), and a radiosonde (launch time 21:00 UTC) on 30 October 2012. The light blue shadow areas represent the statistical error affecting the water vapor mixing ratio profile from WALI. The apparent backscattering coefficient (ABC) is expressed in arbitrary units. GBMS indicates a ground-based meteorological station

Figure 19 illustrates the vertical profiles of mean bias and RMS deviation between the BASIL and WALI up to 1.5 km. Data above this height are not illustrated because of the large statistical uncertainties affecting the Raman lidars above 1.5 km. For the purpose of determining these profiles, we considered a sequence of 40 simultaneous 5-min averaged profiles from both BASIL and WALI. Bias and RMS deviation profiles show values not exceeding 0.30 g kg−1 (or 5.6%) and 0.47 g kg−1 (or 8.6%), respectively, at all altitudes. Largest values of the RMS deviation are found around 1.5 km, as a result of the large statistical uncertainty affecting WALI and its performance at these heights.

Fig. 19
figure 19

Vertical profiles of BIAS and RMS deviation of BASIL vs. WALI (panel a) in terms of water vapor mixing ratio and corresponding percentage values (panel b). The time interval for this comparison is from 16:49 to 21:35 UTC on 30 October 2012

It is to be specified that the mutual bias between BASIL and WALI is to be partially attributed to the different calibration procedure applied to the two lidars. More specifically, the calibration of BASIL relies on an extended comparison effort between BASIL and the radiosondes launched in Candillargues (Di Girolamo et al. 2016), and consequently refers to a mean calibration coefficient obtained from approximately 50 comparisons of BASIL with simultaneous radiosondes, while the calibration of WALI relies on the only comparison with the radiosonde launched at 20:58 UTC on this day.

Based on the available data set, we also computed the vertically averaged weighted mean values of bias and RMS deviation, as well as their percentage values, for the 41 simultaneous and almost consecutive 5-min averaged profiles from both BASIL and WALI (Table 2). The vertical interval considered in the vertically averaged mean is up to 1.5 km. Results in Table 2 reveal the presence of bias values in the range − 0.47–0.34 g kg−1 (or − 8.63–4.11%), absolute bias values in the range 0.10–0.47 g kg−1 (or 1.07–8.63%), as well as RMS deviation values in the range 0.20–0.57 g kg−1 (or 4.46–10.70%).

Table 2 Inter-comparisons between BASIL and WALI. The table includes dates and times of all inter-comparisons, together with the vertically averaged weighted mean values of bias, absolute bias and RMS deviation, as well as their percentage values, computed over the inter-comparison range between 0.5 and 1.5 km. The start time of each 5-min interval is included in the table

5.6 Assessment of the BIAS and RMS deviation between the different sensors

The performance of the different profiling sensors considered in the present study in terms of both water vapor mixing ratio and temperature is assessed through a dedicated statistical analysis including all possible comparisons for all possible sensor pairs. Specifically, 26 profile-to-profile comparisons were considered of BASIL vs. LEANDRE 2, 348 of BASIL vs. MWR, 26 of BASIL vs. the in situ sensors, 32 of BASIL vs. the radiosondes, 41 of BASIL vs. WALI, 42 of MWR vs. the radiosondes, and 14 of MWR vs. the in situ sensors. Figures 20 and 21 show the water vapor mixing ratio and temperature BIAS and RMS deviation profiles, respectively, for all sensor pairs. Profiles are illustrated from 0.5 to 3 km for water vapor sensor pairs and from 0.5 to 6 km for temperature sensor pairs.

Fig. 20
figure 20

Water vapor mixing ratio percentage BIAS and BIAS (panels a and c) and percentage RMS deviation and RMS deviation (panels b and d) for the different sensors’ pairs determined using all available profile-to-profile inter-comparisons

Fig. 21
figure 21

Temperature BIAS (panel a) and RMS deviation (panel b) for the different sensors’ pairs determined using all available profile-to-profile inter-comparisons

For what concerns water vapor mixing ratio measurements, bias values are in the range ± 1 g kg−1 (or ± 10%) above 1 km. Below 1 km bias values are larger (up to 3 g kg−1 or 25%) for those comparisons characterized by a larger distance between the sounded air masses (radiosondes vs. LEANDRE 2 and BASIL vs. radiosondes). More specifically, above 1 km bias values of BASIL vs. the radiosondes is in the range ± 0.5 g kg−1 (or ± 7%), with similar small values being also observed in the comparison of BASIL vs. the in situ sensor and BASIL vs. LEANDRE 2, in addition—as was reported in the previous section—to the comparison of BASIL vs. WALI. Above 1 km larger bias values (up to ± 2 g kg−1 or ± 30%) are found to characterize the comparisons of the radiosondes vs. the in situ sensor, of the in situ sensor vs. LEANDRE 2, of the radiosondes vs. LEANDRE 2, and of the in situ sensor vs. the MWR, with values increasing with height as a result of the increasing distance between the sounded air masses. RMS deviation values are smaller than 3 or 40% above 1 km.

For what concerns the temperature measurements, bias values are in the range ± 1 K above 1.5 km, with the only exception of the comparison of the MWR vs. the radiosondes, with values up to 1.5 K. Below 1.5 km, bias values are larger (up to 2 K) for those comparisons characterized by a larger distance between the sounded air masses (BASIL vs. radiosondes and BASIL vs. in-situ sensor).

Based on the application of the weighted mean defined in expressions (3) and (4), the values of the vertically averaged mean mutual bias, \( \overline{\mathrm{bias}} \), and RMS deviation, \( \overline{\mathrm{RMS}} \), and the vertically averaged mean absolute mutual bias, \( \left|\overline{\mathrm{bias}}\right| \), over the entire inter-comparison range (0.5 to 3 km for all sensors’ pairs including LEANDRE 2, 0.5-1.5 km for BASIL vs. WALI and 0.5–6 km for all other sensors’ pairs) were determined. Values of \( \overline{\mathrm{bias}} \) and \( \overline{\mathrm{RMS}} \) for the different sensors’ pairs involved in the inter-comparison effort are listed in Table 3 for both water vapor mixing ratio and temperature measurements.

Table 3 Water vapor mixing ratio and temperature vertically averaged mean BIAS, absolute BIAS and RMS deviation values for the different sensors’ pairs involved in the inter-comparison effort. The inter-comparison range is up to 1.5 km for BASIL vs. WALI, up to 3 km for all sensors’ pairs including LEANDRE 2, and up to 6 km for all other sensors’ pairs

For what concerns water vapor mixing ratio measurements, values of \( \overline{\mathrm{bias}} \) are smaller than 0.8 g kg−1 or 9% for all different sensors’ pairs. \( \left|\overline{\mathrm{bias}}\right| \) values are larger than \( \overline{\mathrm{bias}} \) values, being anyhow not exceeding 1.1 g kg−1 and 11.05%. \( \overline{\mathrm{RMS}} \) values are in the range 0.36–2.20 g kg−1 and 7.42–33.26%. More specifically, the smallest \( \overline{\mathrm{bias}} \) values are observed between BASIL and WALI (− 0.005 g kg− 1), between BASIL and LEANDRE 2 (− 0.08 g kg−1) and between BASIL and the MWR (− 0.02 g kg−1). Similarly, small values are observed between the radiosondes and the MWR (− 0.13 g kg−1). Bias values smaller than 0.3 g kg−1 are found between BASIL and the radiosondes (0.28 g kg−1), between BASIL and the in situ sensor (0.22 g kg−1) and between LEANDRE 2 and the radiosondes (0.21 g kg−1), while bias values smaller than 0.4 g kg−1 are found between the in situ sensor and the radiosondes (0.36 g kg−1) and between the in situ sensor and the MWR (− 0.37 g kg−1). The largest bias value is observed between LEANDRE 2 and the in situ sensor (0.76 g kg−1). Percentage bias are found to range between 1.1% and 8.9% . Small \( \left|\overline{\mathrm{bias}}\right| \) values are observed between the radiosondes and MWR (0.20 g kg−1), between the BASIL and the MWR (0.22 g kg−1), and between BASIL and WALI (0.24 g kg−1). Slightly larger \( \left|\overline{\mathrm{bias}}\right| \) values are observed between the radiosondes and the in situ sensor (0.36 g kg−1), between the in situ sensor and the MWR (0.58 g kg−1), and between BASIL and the in situ sensor/the radiosondes/LEANDRE 2 (0.43 g kg−1, 0.51 g kg−1, 0.67 g kg−1, respectively). The largest \( \left|\overline{\mathrm{bias}}\right| \) values are observed between LEANDRE 2 and the in situ sensor (0.76 g kg−1) and between LEANDRE 2 and the radiosondes (1.10 g kg−1). \( \overline{\mathrm{RMS}} \) deviation values are in the range 0.36–2.20 g kg−1 or 7.42–33.26%, the lowest values being observed between BASIL and WALI (0.36 g kg−1 or 7.42%) and the highest values being observed between LEANDRE 2 and the radiosondes (2.20 g kg−1 or 33.26%). As already specified above, RMS deviation values are always larger than corresponding bias values. In this regard, it is to be recalled that the former quantity not only includes the differences associated with the statistical and systematic error sources affecting the two instruments but also the differences associated with their statistical uncertainties and the natural differences of the measured atmospheric parameter (either water vapor mixing ratio or temperature) in the different air masses sounded by the two compared sensors. In this regard, the largest differences in air mass properties are found to take place within the PBL, where lack of homogeneity in both the thermal and humidity field is higher. Consequently, large \( \overline{\mathrm{RMS}} \) values within the PBL are caused by the variability of the thermal and humidity field within this height region.

For what concerns temperature measurements, values of \( \overline{\mathrm{bias}} \) are in the range 0.15–0.75 K, with the smallest value characterizing the comparison between BASIL and the in situ sensor (0.15 K), the bias between BASIL and the radiosondes being − 0.43 K, between BASIL and the MWR being 0.62 K, between the in situ sensor and the radiosondes being 0.61 K, and between the radiosondes/in situ sensor and the MWR being 0.74 /0.75 K. Values of \( \left|\overline{\mathrm{bias}}\right| \) are coincident with those of \( \overline{\mathrm{bias}} \) for 4 out of 6 comparisons, which testifies that one of the two sensors has values systematically larger (or smaller) than the other sensor for these sensors’ at all altitudes pairs. \( \overline{\mathrm{RMS}} \) deviation values are in the range 0.94–1.48 K.

5.7 Overall bias affecting all sensors

The knowledge of the mutual biases between the different sensors’ pairs allows determining the overall biases affecting all considered sensors with respect to the mean measurement value (Behrendt et al. 2007a, 2007b). For this purpose, one of the sensors involved in the inter-comparison effort must have collected simultaneous and co-located data, even for a short period, with all other sensors. In the present research effort, we considered the Raman lidar system BASIL for this purpose. Assuming equal weight on the data reliability, and consequently the accuracy, of each sensor, an estimate of the overall bias affecting the different sensors is obtained by imposing that the summation of all mutual biases between sensor pairs is equal to zero. The overall bias obtained through this approach is the bias of each instrument when referred to the mean of all instruments. This bias is defined over the altitude interval considered for the inter-comparison effort, i.e., from surface level up to 6 km for all sensors’ pairs, with the only exception of those pairs including the LEANDRE 2 water vapor mixing ratio profiles, which extended up to 3 km, and BASIL vs. WALI, which extended up to 1.5km. Results obtained through the application of this approach are illustrated in panel a and b of Fig. 22 for water vapor mixing ratio and temperature measurements, respectively. For what concerns water vapor mixing ratio measurements, the overall bias for BASIL, LEANDRE 2, WALI, the radiosondes, the microwave radiometer, and the aircraft in situ sensors are 0079 g kg−1, 0.159 g kg−1, 0.084 g kg−1, − 0.201 g kg−1, 0.099 g kg−1, and − 0.141 g kg−1, respectively. For what concerns temperature measurements, the overall bias is 0.11 K, 0.54 K, − 0.04 K, and − 0.51 K for BASIL, the radiosondes, the microwave radiometer, and the aircraft in situ sensor, respectively.

Fig. 22
figure 22

Overall bias affecting water vapor (a) and temperature (b) data

The overall measurement bias is found to be within ± 0.02 g kg−1 for all water vapor sensors, which confirms the high measurement accuracy of the sensors included in this inter-comparison effort.

With respect to the temperature measurements, it is to be noticed that, even if the calibration of BASIL is based on the use of the simultaneous radiosonde profiles up to 2 km, the overall bias difference between BASIL and the radiosondes of ∼ 0.4 K is compatible with the variability of the thermal field above 2 km and up to 6 km, this latter being the upper altitude level considered in the computation of the mutual bias between BASIL and the radiosondes. In fact, this altitude region is usually characterized by strong vertical gradients, especially in the proximity and above the PBL top, where changes associated with temperature inversions are frequently observed.

The overall bias values reported in this section for the different commercial sensors are compatible with the uncertainties specified by the immanufacturers, which were reported in sub-sections “Microwave radiometer,” “Thermodynamic sensors onboard the ATR42,” and “Radiosoundings.” Specifically, for what concerns water vapor mixing ratio measurements, the radiosounding bias specified by the manufacturer (0.20–0.25 g kg−1) is in good agreement with the overall bias estimated here (0.201 g kg−1); the microwave radiometer bias, specified by the manufacturer to be 0.08 g kg−1, is in good agreement with its estimated overall bias of 0.099 g kg−1; and the aircraft in situ humidity sensor bias, specified by the manufacturer to be 0.05–0.07 g kg−1, is in good agreement with the estimated overall bias of 0.141 g kg−1. For what concerns temperature measurements, the radiosounding, the microwave radiometer and aircraft in situ sensor biases, bias specified by their manufacturer to be 0.3 K, 0.5− 1 K and 0.1 K, respectively, are in good agreement with the overall biases estimated in this section, i.e. 0.54 K, 0.04 K and 0.51 K, respectively.

6 Summary and future perspectives

High-quality water vapor and temperature observations are necessary to improve our understanding of the Earth’s climate system, as well as to improve weather forecasts. Unfortunately, despite their importance, conventional observational techniques for measuring tropospheric water vapor and temperature are characterized by non-negligible biases and/or limited coverage. For example, passive remote sensing systems in space, due to the intrinsic limitations in the inversion of the radiative transfer equation, are characterized by a vertical resolution too coarse (> 1 km) to allow temperature and moisture gradients in the lower troposphere to be resolved. Similar arguments hold for GNSS-RO, which is characterized by a poor horizontal resolution (~ 450 km) and the water vapor and temperature information in the refractivity measurements cannot be disentangled in the lower troposphere without other information from remote sensing or model output. On the other end, the global radiosonde network provides high vertical resolution profiles at discrete locations often separated by hundreds of kilometers and typically with 24- or 12-hourly time resolution.

Lidar systems based on the application of the DIAL and Raman techniques have the potential to fill these observational gaps. More specifically, lidar measurements of atmospheric humidity and temperature are characterized by high time-vertical resolution and accuracy, as well as by the capability to cover a substantial portion of the troposphere. Based on the dataset collected in the frame of HyMeX-SOP1, the present paper provides accurate error estimates for several lidar systems and other sensors measuring vertical profiles of water vapor and/or temperature. For this purpose, the University of BASIlicata ground-based Raman Lidar—BASIL—system was deployed in Candillargues (Southern France, Lat: 43° 37′ N; Long: 4° 4′ E) and was operated for a 2-month period (September through November 2012) in the framework of the HyMeX-SOP1.

This inter-comparison effort aims at providing accurate error estimates for the lidar systems and the other sensors involved in this field effort. For this purpose, simultaneous and co-located or almost co-located data from the different sensor pairs were used to compute mutual bias and RMS deviations as a function of altitude. Vertical profiles of the mutual bias and RMS deviation for the different sensors’ pairs were determined in the vertical interval between 0.5 and 1.5/3/6 km depending on the considered sensors' pairs and their vertical coverage. Mutual biases affecting water vapor mixing ratio profile measurements are found to be in the range ± 1 g kg−1 (or ± 10%) above 1 km, while below this height bias values are larger (up to 3 g kg−1 or 25%), especially for those comparisons characterized by a large distance between the compared sensors and the consequent differences between the sounded air masses. Mutual bias affecting temperature profile measurements are typically in the range ± 1 K above 1.5 km, while larger bias values (up to 2 K) are observed below this level.

Values of the vertically averaged mean bias, \( \overline{\mathrm{bias}} \), and RMS deviation, \( \overline{\mathrm{RMS}} \), and of the vertically averaged mean absolute bias, \( \left|\overline{\mathrm{bias}}\right| \), were also determined. For what concerns water vapor mixing ratio measurements, values of \( \overline{\mathrm{bias}} \) are smaller than 0.8 g kg−1 (or 9%) for all different sensors’ pairs, values of \( \left|\overline{\mathrm{bias}}\right| \) are smaller than 1.1 g kg−1 (or 17.75%), while values of \( \overline{\mathrm{RMS}} \) are in the range 0.36–2.20 g kg−1 or 7.42–33.26%. For what concerns temperature measurements, values of \( \overline{\mathrm{bias}} \), \( \left|\overline{\mathrm{bias}}\right| \), and \( \overline{\mathrm{RMS}} \) are smaller than 0.75, 0.77 K, and 1.71 K, respectively, for all different sensors’ pairs.

The possibility to assess the overall bias of all sensors included in this inter-comparison effort was also exploited, benefiting from the circumstance that the Raman lidar BASIL could be compared with all other sensor/model data. The overall bias affecting water vapor/temperature profile measurements is 0.079 g kg−1/0.11 K, 0.159 g kg−1, 0.084 g kg−1,− 0.201gkg−1/ 0.54 K, 0.099 g kg−1/− 0.04 K, and − 0.141 g kg−1/− 0.51 K for BASIL, LEANDRE 2, WALI, the radiosondes, the microwave radiometer, and the aircraft in situ sensor, respectively. For all water vapor sensors, the overall bias is within ± 0.02 g kg−1.

The present study demonstrates the high quality of the water vapor and temperature profile measurements carried out by Raman and differential absorption lidars. Results illustrated and discussed in this paper provide confidence on the possibility to use long-term records of these measurements in weather forecast and process studies and for monitoring changes in atmospheric composition and thermal structure and, ultimately, for climate trend studies.