Background & Summary

Air quality remains a major concern in many parts of Europe, especially in urban areas1. The most important air pollutants in terms of health are particulate matter (PM), nitrogen dioxide (NO2), and ground-level ozone (O3).

For the efficient implementation of air policies, air quality monitoring data with high spatial density and temporal resolution, and with sufficient quality are needed; These data can supplement data from Air Quality Monitoring Stations (AQMSs) that are used to assess the ambient air quality in Europe as defined in the Directive 2008/50/EC.

Low-cost air quality sensor systems consist of an integrated set of hardware that uses one or more sensors to measure the quantity of a chemical species and can supply real time measurements2.

Thanks to their lower cost than the reference air quality monitoring methods3 sensor systems can be deployed at high density, making them a significant candidate of complementary tools for improved air quality management. However, they still suffer from poor or unknown data quality3. Sensor signals can be affected by interfering compounds, temperature, humidity, pressure and signal drift over time4,5,6.

The European Commission - Joint Research Centre (JRC) has recently conducted a research project to evaluate low-cost sensor system, namely the AirSensEUR, as a supplemental tool for reference air quality monitoring. The AirSensEUR sensor system (Fig. 1) contains sensors to monitor NO, NO2, O3, CO, PM2.5, PM10, PM1, CO2 and meteorological parameters (temperature, relative humidity, and atmospheric pressure). Within this project, the air quality data from the 85 AirSensEURs and partially from the Air Quality Monitoring Stations (AQMSs) were collected. The 85 sensor systems were deployed throughout a year in three European cities, namely Antwerp (Belgium)-34 systems, Oslo (Norway)-34 systems, and Zagreb (Croatia)-17 systems. The aims of this project are to obtain insight in the performance evaluation of low-cost sensors under different meteorological and ambient conditions, and to explore and evaluate calibration approaches7.

Fig. 1
figure 1

AirSensEUR sensor systems: (top left) chemical shield with gas sensors; (top right) sensor box with sampling inlet of OPC-N3 on top; (bottom left) sensor box in outdoor enclosure with gas sensors and PMS5003 sensor inlet and (bottom right) sensor box installed at Air Quality Monitoring Stations.

In this paper, we report a unique dataset including the raw sensor data of quality-controlled sensor networks along with comprehensive reference data sets when the sensors are co-located at an AQMS. With permission of the European Commission, we make these data available for the research community to enable further research without the need for the very time- and resource-consuming process of collecting the data themselves.

Although it is possible to find many articles8, reports (https://vaquums.eu/deliverables) in literature or web pages(https://web.jrc.ec.europa.eu/rapps/pub/aqsensors/, http://www.aqmd.gov/aq-spec/evaluations#&MainContent_C001_Col00=2, https://www.epa.gov/air-sensor-toolbox) about the accuracy of sensor systems, to the best of our knowledge there is no substantial open dataset published, presenting collocated raw sensor data and reference measurements except one study reporting data from twelve particulate matter sensors collocated with beta-attenuation PM mass monitor for three months9. Research topics may include but are not limited to developing and testing calibration models for air quality sensors, implementing correction algorithms, and evaluating the sensor performance under different environmental conditions. The data may help to develop and evaluate drift of calibration models and to a better understanding of sensor performance triggering advancement in sensor technology which may result in improved sensor data quality.

The main data collection period lasted from April 2020 until April 2021, but not all sensor systems acquired data for the entire period. The sensor systems were first co-located at a reference AQMS in each city, then deployed at various locations in the city and after that co-located again at the same reference AQMS.

An overview of the timeline and sampling sites is given in Fig. 2.

Fig. 2
figure 2

Overview of the sampling with sampling site locations and timeline. Map data ©2022 Google Imagery ©2022 Nasa, Terrametrics.

Broad ranges of meteorological conditions and pollutant concentrations were covered during co-locations and deployment.

The deployment sites in each city show a good spatial distribution within the city and are characterised by different impact of traffic. Some of the sites are the AQMSs and were selected to observe correspondence with reference data over a longer time. In addition, some sites were selected very close to each other to assess short-term spatial variability. At some of the AQMSs, duplicate sensors were deployed to evaluate the between sensor variance.

Methods

AirsensEUR sensor systems

The sensor systems used in this study were AirSensEUR version 3.07. AirSensEUR is an Open Platform project developed by the JRC in collaboration with Liberaintentio S.r.l (IT), aimed at measuring air quality accurately using low-cost sensors (LCSs). Both hardware schematics and the software running on the units, or the calibration procedures implemented as a post-processing of the collected data are described under public licenses (https://github.com/ec-jrc/airsenseur-sensorsshield, https://github.com/ec-jrc/airsenseur-sensorshost, https://github.com/ec-jrc/airsenseur-box, https://github.com/ec-jrc/airsenseur-box).

The data collected from the units were stored locally and periodically sent to external server to an InfluxDB database7 for offline post-processing and/or calibration. The data were generally transferred via GPRS/LTE and via WiFi connections for a few units.

AirSensEUR includes a PTFE enclosure with a size of 26 cm × 22 cm × 10 cm and a weight of 2 kg, battery included (see Fig. 1 top). The PTFE enclosure was inserted in a stainless-steel protecting cover. The overall size of protective stainless-steel cover is 35 cm × 32 cm × 30 cm except for the top cover, which is made from a 42 cm × 45 cm aluminium plate (see Fig. 1 bottom).

Table 1 gives an overview of the measured pollutants, the sensor type and manufacturer. The OPC-N3 has 24 size bins (0.3/0.35–40 µm) with a counting efficiency of 50% @ 0.3 μm and 100% @ 0.35 μm and the PMS5003 has 6 size bins (>0.3 µm) with a counting efficiency of 50% @ 0.3 µm and 100% @ 0.5 µm. Both counting efficiencies were claimed by the manufacturers although it was shown by experiments that the counting efficiency for the PMS5003 sensor is about 80% at 0.5 µm10. No publications on counting efficiencies of OPC-N3 are available to our knowledge.

Table 1 Sensors included in the AirSensEUR sensor systems including sensor variables as provided in the dataset.

The gas sensors for NO2, CO, NO and O3 are installed on the AirSensEUR Chemical sensor Shield (version R31), PM and CO2 sensors are installed on the Exp1Shield R10 sensor shield7. In addition, the sensor box is equipped with sensors for monitoring temperature and relative humidity inside the AirSensEUR box nearby chemical sensors and other sensors for monitoring ambient air temperature, relative humidity and atmospheric pressure outside the AirSensEUR box on a Flyboard. The information presented in Table 1 is also given in the metadata file metadata_sensors.csv11.

Sensor box sampling periods

The data were collected between April 2020 and April 2021. The exact sampling intervals in each city were slightly different. An overview of the timeline of the sampling is given in Fig. 2. The Figure shows the dates of the feasibility study in Ispra, the pilot studies in the cities, the first co-location in the cities, the deployment at different sites in the cities and second co-location in the cities. Details on the sampling is given in the paragraph below. A detailed overview of the start and stop dates at each location is given in metadata file (File metadata_dates.csv11).

Prior to the sampling campaigns in the cities, selected sensor systems were deployed at the EMEP-ABSIS-ICOS station of the JRC in Ispra (IT) as an initial feasibility study. Subsequently, the pilot studies were performed prior to the sampling campaigns in each city with the same ten boxes of the feasibility study, to test data transfer, installation, etc. In each city, the sampling campaigns included three consecutive phases:

  • co-location of all sensor systems at an AQMS in the city (hereafter called ‘first co-location’)

  • deployment of the sensor systems at different locations of the same city (hereafter called ‘deployment’)

  • co-location of all sensor systems at the same AQMS of the first collocation (hereafter called ‘second co-location’)

Feasibility study in Ispra and pilot studies in the three cities (Antwerp, Oslo and Zagreb) prior to the main co-locations and deployment campaigns

Ten sensor systems were installed at the EMEP-ABSIS-ICOS station of the JRC in Ispra (IT), a semi-rural site in Northern Italy12, between 17 and 31 January 2020 (Fig. 2). The same ten systems were used in the pilot study in the three cities: four of the sensor systems were deployed in Antwerp (40641B, 4065D0, 4065E0 and 4065E3) and in Oslo (40458D, 40642E, 4065ED and 40816F) and two in Zagreb (4047D0 and 406414). The purpose of this study was to check the reliability of AirSensEUR sensor systems as well as to collect data for calibration at a semi-rural site. The characteristics of the reference air pollution analysers and meteorological parameters at the EMEP-ABSIS-ICOS station are given in Table 2. The gas analysers were routinely calibrated, and daily calibration checks were performed to detect and correct possible drifts of the monitoring equipment.

Table 2 The reference analysers used at the AQMSs.

The sensors systems used in the feasibility study in Ispra were also included into initial pilot studies in Antwerp, Oslo and Zagreb before the first co-location in order to check the correct deployment and operation at a few field sites (see Fig. 2 and file metadata_dates.csv11).

Common naming-convention for sampling site description

A common naming-convention for sampling site description for the three consecutive phases in the three cities is used. The sampling site labels (IDs) are of the form is XXX_YYY_ZZZ(Z) with:

  • The XXX referring to the city: ANT (Antwerp); OSL (Oslo); ZAG (Zagreb);

  • The YYY describing the type of location: URB (Urban background or suburban background); TRA (Traffic site in urban or suburban area), RUR (Rural site), REF (AQMS with reference measurements, without any further characterisation);

  • The three or four ZZZ(Z) referring to the street name of location, or the name of AQMS.

Main co-locations and deployment campaigns in antwerp

The sensor systems installation at the AQMS 42R801 of Borgerhout for the first co-location took place on 2020-04-02 and 2020-04-03, where the sampling lasted roughly until 2020-06-05 (about 72 days). Between 2020-06-15 and 2020-06-18, the sensor systems were moved to their deployment sites, apart from two units that were installed on 2020-06-22. The sensor systems stayed at the deployment sites for approximately 8 months. Between 2021-02-17 and 2021-02-26 the sensor systems were taken down from their deployment locations and installed at the same AQMS for the second co-location lasted until 2021-04-13 (lasted about 45 days). A detailed overview of the start- and stop dates at each location (deployment sites) is given in metadata file (File metadata_dates.csv11) and is visually displayed in Fig. 4.

Main co-locations and deployment campaigns in Oslo

The sensor systems involved in the first co-location exercise were installed at the Kirkeveien AQMS between 2020-08-26 and 2020-08-28, which the sampling lasted roughly until 2020-10-14 (about 48 days), except for two sensor systems (4065ED and 40458D) that stayed at the pilot sites (OSL_TRAF_VINK and OSL_TRAF_LIND). Subsequently, all units except for two, were moved to their deployment sites. The installation at the deployment sites started on 2020-10-16, and by 2020-12-01, most sensor systems were operational until 2021-03-08 (roughly 88 days). The installation of sensor systems for the second co-location took place on 2021-03-08 and 2021-03-10, which lasted until 2021-04-09 (roughly 31 days). One sensor system collected data over a very limited period. A detailed overview of the start- and stop dates at each location (deployment sites) is given in metadata file (File Metadata_dates.csv11) and is visually displayed in Fig. 6.

Main co-locations and deployment campaigns in Zagreb

The sensor systems installation for the first co-location at the IMI AQMS took place on 2020-05-18 and 2020-05-27, and co-location lasted roughly until 2020-07-15 (around 60 days). The deployment period was roughly between 2020-07-20 and 2021-02-18 (approximately 7 months). The second co-location lasted roughly between 2021-03-03 and 2021-04-12 (approximately 37 days). A detailed overview of the start- and stop dates at each location (deployment sites) is given in metadata file (File metadata_dates.csv11) and is visually displayed in Fig. 8.

Co-location sites

The details of the AQMSs where the co-location campaigns took place including measured pollutants and reference analysers at each AQMS are given in Tables 2. In Oslo (OSL_REF_KVN), the Palas Fidas 200 data is reported only during the 1st co-location and during the 2nd co-location, the data from the TEOM instrument is reported instead of the Palas Fidas 200 data. For naming convention of the test sites in the Table 2, we refer to section “Sensor locations: deployment sites”.

Co-location site in antwerp

The AQMS of ANT_REF_R801 (Station 42R801- Borgerhout, see www.irceline.be, 4.43178°E, 51.20961°N, at an attitude of 10 m) is an urban background station installed at 30 m from the main road Plantin en Moretuslei.

The reference monitoring at ANT_REF_R801 includes PM10, PM2.5, NOx (NO2 and NO), CO, CO2 and O3, see Table 2. The reference station also includes SO2, BC and UFP monitoring. CO and CO2 monitoring are not permanently performed. During the two co-locations, one CO and one CO2 monitors were temporally installed at the station (same inlet as other gases). The list of AirSensEUR sensor systems co-located at the ANT_REF_R801 station are given in metadata_dates.csv11.

Co-location site in Oslo

The Kirkeveien AQMS (OSL_REF_KVN), located at 10.72455°E, and 59.93230°N at an altitude of 58.3 m, is a traffic station situated next to an urban ring road with an average daily traffic intensity of ca. 15,000 vehicles.

The reference monitoring at OSL_REF_KVN normally includes PM10, PM2.5, CO, NO and NO2. An O3 monitor was additionally installed at the station for two co-location campaigns. In addition to two TEOM PM monitors with PM2.5 and PM10 inlets, a Palas Fidas 200 instrument was also operational during the 1st co-location campaign. The list of AirSensEUR sensor systems co-located at the OSL_REF_KVN station are given in metadata_dates.csv11.

Co-location site in Zagreb

The IMI AQMS (ZAG_REF_IMI), located at 45.835305°N, 15.977822°E, at an altitude of 195 m, is an urban background station within the Zagreb network for air quality monitoring.

The reference monitoring at ZAG_REF_IMI includes PM10, PM2.5, NOx (NO2 and NO), CO, O3, SO2 and benzene. The list of AirSensEUR sensor systems co-located at the ZAG_REF_IMI station are given in metadata_dates.csv11.

Sensor locations: deployment sites

The sensor locations in the three cities for the deployment are given in Figs. 3, 5, 7. Detailed information of the timeline of the data collection for the different sensor systems and their locations is displayed in Figs. 4, 6, 8. The common naming convention of the form XXX_YYY_ZZZ(Z) is explained before under the heading ‘Common naming-convention for sampling site description’. The deployment sites were selected to assure a good spatial coverage over each city as well as a suitable distribution between background, traffic and AQMSs. The deployment sites are characterised by different impact of traffic: both in terms of traffic density as well as distance to the street. Most sampling sites are at other locations than an AQMS (further referred to as ‘dedicated sites’). Some sites were selected very close to each other (neighbouring sensors) with variation in traffic density to assess short-term spatial variability. Some sensor systems were installed at the AQMSs to check the agreement between sensors and reference analysers over a longer period than the co-location periods. At some of the AQMSs, the duplicate systems were deployed to evaluate the between sensor variances. In total, three, three and one duplicate sensor systems were installed respectively in Antwerp, Oslo and Zagreb. The duplicate sensor systems were respectively: 40499 C – 4043A7, 4049A6 – 4067BD, 4043AE – 4067B3 for Antwerp; 40642E – 64FD11, 64E9C5 – 65063E,649526 – 42816E for Oslo and 4047D0 - 427907 for Zagreb. The corresponding locations of these boxes during the deployment are given in metadata_dates.csv and the locations are described in metadata_sites.csv11. The number of sensor systems installed in each city and the distribution over AQMSs and ‘dedicated’ locations is given in Table 3.

Fig. 3
figure 3

Sensor deployment sites in Antwerp. Imagery ©2022 Google, Imagery ©2022 TerraMetrics, Map data ©2022 Google.

Fig. 4
figure 4

Timeline of sensor deployment in Antwerp, with sensor ID on left axis and location ID displayed in the colored bars that represent the different phases over time.

Fig. 5
figure 5

Sensor deployment sites in Oslo. Imagery ©2022 Google, Imagery ©2022 TerraMetrics, Map data ©2022 Google.

Fig. 6
figure 6

Timeline of sensor deployment in Oslo, with sensor ID on left axis and location ID displayed in the colored bars that represent the different phases over time.

Fig. 7
figure 7

Sensor deployment sites in Zagreb. Imagery ©2022 Maxar Technologies, CNES/Airbus, Maxar Technologies, Google, Airbus, Imagery ©2022 TerraMetrics, Map data ©2022 Google.

Fig. 8
figure 8

Timeline of sensor deployment in Zagreb, with sensor ID on left axis and location ID displayed in the colored bars that represent the different phases over time.

Table 3 Total number of AirSensEUR sensor systems and their distribution at the AQMSs and dedicated sites during the deployment in the three cities.

The file metadata_sites.csv11 file contains the metadata of sampling sites in the three cities with the distances to road and an indication of traffic intensity. For Antwerp, the traffic intensity (vehicles per hour) was based on the modelled data of Department MOW (Mobiliteit en Openbare werken or Mobility and Public works), calculated from the annually averaged traffic density over all hours of the day, and therefore the actual vehicle numbers during peak hours may be much larger. For Oslo, the daily averaged traffic density was initially estimated using a traffic model and, then it was converted to averaged hourly traffic density to be consistent with the Antwerp data. For Zagreb, the quantitative traffic density information was not available, instead, the qualitative estimates were provided.

Detailed information of each sampling site is given as a pdf file (metadata_sampling_site_description.pdf11).

Conditions during co-location and deployment

The ambient conditions and concentrations during the co-location campaigns and deployment showed broad ranges.

Meteorological conditions

A broad range of atmospheric conditions was covered during co-location and deployment in the three cities. An overview of the conditions is summarized in Table 4.

Table 4 Ranges of atmospheric conditions during the co-locations and deployment in the three cities ([min, max] hourly values for Oslo and Antwerp; daily values for Zagreb).

During the co-location periods, the hourly temperature ranged between −1 and 28 °C in Antwerp and between −3 and 26 °C in Oslo. The daily averages in Zagreb were between 0 and 26 °C. The hourly relative humidity ranged between 25 and 100% in Oslo, between 21 and 99% in Antwerp and between 44 and 89% in Zagreb (daily values).

During the deployment, the hourly temperature ranged between −7 and 39 °C in Antwerp and between −14 and 12 °C in Oslo. The daily averages in Zagreb were between −3 and 27 °C. The hourly relative humidity ranged between 29 and 100% in Oslo, between 20 and 100% in Antwerp and between 33 and 97% in Zagreb (daily values).

The deployment in Oslo did not cover the summer period, resulting in a narrower range in atmospheric conditions.

Pollutant concentrations

The air pollutant concentrations measured at the AQMSs during the co-location campaigns showed broad ranges, with the maximum (hourly) concentrations of 175 μg/m3 O3 in Zagreb, of 114 μg/m3 NO in Oslo, of 139 μg/m3 NO2 in Antwerp and of 152 μg/m3 for PM10 in Antwerp.

During the deployment, the concentrations measured at the AQMSs showed also broad ranges, with the maximum (hourly) concentrations of 241 μg/m3 O3 in Zagreb, of 292 μg/m3 PM2.5 in Oslo and of 125 μg/m3 NO2and 312 μg/m3 NO in Antwerp.

Notable differences in concentrations were observed between the two co-locations and deployment periods as well as between the cities.

Data Records

The data are publicly available and can be freely accessed from Zenedo11. The dataset consists of the data files (Directory dataset) and metadata (Directory metadata).

Sensor and reference data

One data file is supplied for each AirSensEUR sensor system in csv format comma separated without quotes. The naming-convention of the data files is given as “City_ASE_ID.csv”, where City corresponds to the city where the AirSensEUR sensor systems were deployed, ASE stands for AirSensEUR and ID is a unique identifier of each sensor system. The data files are given in wide format with one row of data for each minute when the AirSensEUR sensor systems recorded any data of any sensors. Within each row, any missing data is reported with an empty field. Each row includes minute raw sensor data, reference data, meteorological data (temperature, relative humidity and atmospheric pressure), date, time and location. The column headers present in datasets are listed in Tables 57 with description and units for sensor and reference data. The datasets also include quality flags for sensor data as described in section Technical Validation.

Table 5 Description of date format, location and meteorological data present in all datasets of the AirSensEUR sensor systems.
Table 6 Description of air pollutant sensor data present in all datasets of the AirSensEUR sensor systems.
Table 7 Description of air pollutant reference data present in all datasets of the AirSensEUR sensor systems with their coordinates, temperature, relative humidity, and atmospheric pressure at their location.

In addition to the mass concentrations, particle numbers per bin of Palas Fidas 200 are supplied for the colocation site ANT_REF_801 during the 2 co-location periods and during deployment period in Antwerp (ANT_REF_R801_FIDAS_UTC.csv) and at OSL_REF_KVN in Oslo during the first co-location period (OSL_REF_KVN_Fidas_UTC.csv). The files are comma separated with minute data. The content of these files is described in Table 8. Missing data and invalid data are indicated with empty cell while 0 indicates no particle counts.

Table 8 Description of data present in ANT_REF_R801_FIDAS_UTC.csv and OSL_REF_KVN_Fidas_UTC.csv.

Metadata

Five metadata files are provided to describe:

  • the sensor used in the AirSensEUR sensor systems (metadata_sensors.csv);

  • the brand name of reference analysers (metadata_sites.csv) used at all sampling sites;

  • additional data of the sampling sites, including e.g. location description, positioning of the sensor systems, picture of deployment (metadata_sampling_site.pdf);

  • the sampling dates of the feasibility study, pilot studies, first and second co-location and deployment for all AirSensEUR sensor systems (metadata_dates.csv);

  • and the diameters of particles associated with each bin (metadata_Fidas_um.csv).

The description of metadata in metadata_sensors.csv file is given in Table 1.

The description of metadata in metadata_sites.csv file is given in Table 9.

Table 9 Meta data for sampling sites.

The description of metadata in metadata_dates.csv is given in Table 10.

Table 10 metadata for sampling periods.

Technical Validation

Quality assurance/control (QA/QC) procedures

During deployment sensor systems were regularly checked. In some cases, sensors had to be replaced or cleaned. For the reference data, common QA/QC procedures were applied consistent with the objectives of the European air quality directive (2008/50/EC) and conform with internationally accepted standards (EN ISO/IEC 17025); this means that the reference monitors at AQMS are serviced and calibrated on a regular basis and measurement uncertainties meet the Data Quality Objectives of the European air quality directive (2008/50/EC). The QA/QC described in relevant CEN standards (CEN 14211:2012 for NO/NO213, CEN 14626:2012 for CO14, CEN 14625:2012 for O315 and CEN 16450:2017 for PM2.5 and PM1016) were applied. In this paper, inconsistent sensor data were flagged. Data were flagged when certain threshold values are exceeded, indicating that the results are unreliable. In some cases, data were manually flagged based on knowledge from the field but without certain thresholds exceeded. The principle of data flagging is described below.

Data collection and data flagging

Low-cost sensors may occasionally supply inconsistent data e. g. before reaching equilibria, when they are used out of the interval of temperature or humidity operation, under other extreme conditions, or simply when sensors are being transported. As such, a procedure including quality control and quality assurance (QA/QC) of the sensor data is necessary. In the following, a set of QA/QC and filtering steps is suggested, which has been used to provide quality flags in the datasets.

In all dataset files, columns giving sensor data quality flag are available. They indicate the results of the QA/QC procedures applied to sensor data. The data quality flags are provided for users to be able to filter sensor data in order to ensure using only robust data, or in order to test the output of their own filtering procedures compared to the one provided with the data. The name of the columns with the data flag has a format of Sensor_Flag where Sensor includes: CO_A4_P1, D300, NO_B4_P1, NO2_B43F_P1, OX_A431_P1, 5301CAT, 5301CST, 5325CAT, 5325CST, 5310CAT, 5310CST, OPCN3PM1, OPCN3PM25 or OPCN3PM10. The data flags can contain the following labels:

  • empty labels: indicates valid raw data after all QA/QC procedures are applied;

  • “W” indicates data flagged for warming up of AirSensEUR sensor system after a cold start, any reboot of the AirSensEUR sensor systems or restart of AirSensEUR data acquisition. Warming up time is required for allowing the sensor to reach the full sensor response capacity. Table 11 gives the suggested warming time for all sensor in the row “Warming”;

    Table 11 Parameters of the data filtering procedure.
  • “T.min” or “T.max” and/or “Rh.min” or “Rh.max” indicates data outside temperature and/or relative humidity limits. These four thresholds were empirically determined, either from experience or laboratory experiments4,17,18. Extreme temperature and humidity may affect sensor performance resulting in inaccurate, noisy and/or questionable data. The upper and lower bounds of temperature and relative humidity were set to filter sensor data out, as sensor may behave incorrectly outside these bounds, e. g. OPC-N3 overestimating PM mass concentration for high relative humidity. The suggested limits of acceptability of temperature (T.min and T.max) and relative humidity (rh.min and rh.max) are given in Table 11.

  • “Low_values” or “High_values”: indicates data flagged when data were lower than the minimum acceptable values (Min_values in Table 11) or higher than maximum acceptable values (Max_values in Table 11. Both “Low_values” and “High_values” corresponded to the limits due to the range of the AirSensEUR data acquisition, the operational range of the sensors or impossibilities of air pollution levels.

  • “OutliersMin” or “OutliersMax”: indicates data flagged when applying the outlier filtering procedure. Occasional outliers in sensor data, might happen due to several reasons. The detection of outliers at all xi in dataset was performed using an Hampel filter based on the Mean Absolute Deviation MADi, computed using Eq. 1 over a rolling time windows centred on xi including all xj values within a time Window (see Window in Table 11). Subsequently, MADi was expanded with the Threshold factor (see Threshold in Table 11) in order to determine limits of acceptance for xi. The Threshold factors were set to 20 for all sensors except for CO2 (8), CO and NO (75 each), for which the concentration levels can change rapidly). For any sensor data xi lower than the lower limit defined in Eq. 2, OutliersMin was added to the Sensor_flag column. For any sensor data xi exceeding the higher limit defined in Eq. 3, OutliersMax was added to the Sensor_flag column. A critical point of outlier detection using MAD is to determine the time window such that spikes in data shall be recognized to be real or outlier in measurements. The time window was set to 3 hrs (181 data points).

$${{\rm{MAD}}}_{{\rm{i}}}={\rm{median}}\left|{{\rm{x}}}_{{\rm{j}}}-{\rm{median}}\left({{\rm{x}}}_{{\rm{j}}}\right)\right|{\rm{in}}\;{\rm{a}}\;{\rm{rolling}}\;{\rm{window}}\;{\rm{of}}\;{{\rm{x}}}_{{\rm{j}}}\;{\rm{data}}$$
(1)
$${\rm{OutliersMin}}:{{\rm{x}}}_{{\rm{i}}} < {\rm{median}}\left({{\rm{x}}}_{{\rm{j}}}\right)-{\rm{threshold}}\;{{\rm{MAD}}}_{{\rm{i}}}$$
(2)
$${\rm{OutliersMax}}:{{\rm{x}}}_{{\rm{i}}} > {\rm{median}}\left({{\rm{x}}}_{{\rm{j}}}\right)+{\rm{threshold}}\;{{\rm{MAD}}}_{{\rm{i}}}$$
(3)
  • “Inv” indicates sensor data flagged as invalid. A few invalid sensors were manually flagged as they corresponded to move of the sensor systems, unknown location of sampling, periods of maintenance or calibration of the reference analysers and a few malfunctions of sensors, e.g., insects in OPC, aging of chemical sensor, general failure of sensors. “Inv” is sometimes added to the flag of sensor data although sensor data are correct while because of maintenance or calibration of reference analysers, comparison of reference and sensor data should not be carried out.

For sensor data that do not satisfy two or more of the criteria listed above, the Sensor_flag consist of the concatenation of the flag labels, with a comma separation between quotes.

For the OPC-N3 sensors, the rh.max was initially set to 70%, as suggested by the manufacturer. However, based on testing it was later set to 100% in order to keep all data in case they might be used later on for calibration with Kohler models18,19,20. The Kohler model requires higher relative humidity than 70% for achieving the best possible fit. Several tests were performed to determine the rh.max for multi linear and Kohler fittings. The results showed that for multi-linear and Kohler fittings, the best predictions were obtained by rh.max of 70% and 100%, respectively.

All values of parameters for filtering are given in Table 11. They are mainly derived from experience (i. e. Warming, Window and Threshold). One may notice that the values of T.min, T.max, rh.min and rh.max do not discard many outliers. However, these parameters could be set to more stringent values that could be useful for filtering for example high relative humidity for PM sensors or high temperature that affect NO_B4 sensors. The Min_values and Max_values for sensors CO_A4, NO_B4, NO2_B43F and OX_A431 are constrained by the electronics of the AirSensEUR data acquisition. They should not be changed. Conversely, the Min_values and Max_values for sensors D300, PMS5003, OPC-N3, BMP280, SHT31HE, SHT31HI, SHT31TE and SHT31TI are set using expected reading and can be fine-tuned in order to discard outliers. Finally, the majority of values given in Table 11 are not absolute rules and data users can experiment with new values in order to improve the filtering procedure.

Usage Notes

For users who would like to study minute-level sensor data against minute-level reference data, some lag between sensor and reference time series can be an obstacle. Although AirSensEUR time series refers to Coordinated Universal Time drawn from GPS or GPRS or WIFI, this does not exclude different response time of LCS and reference analyser and other mistakes in reporting time of data series. Before any data treatment, it is strongly suggested to apply a lag correction for the sensor and reference data series being studied. Lag between two data series can be estimated using the output of cross correlation function (CCF)21. The maximum CCF can be estimated using the Find_Max_CCF function in “151016 Sensor_toolBox.R” file (https://github.com/ec-jrc/airsenseur-calibration).