Data Quality in IoT-Based Air Quality Monitoring Systems: a Systematic Mapping Study

Buelvas, Julio; Múnera, Danny; Tobón V., Diana P.; Aguirre, Johnny; Gaviria, Natalia

doi:10.1007/s11270-023-06127-9

Data Quality in IoT-Based Air Quality Monitoring Systems: a Systematic Mapping Study

Open access
Published: 03 April 2023

Volume 234, article number 248, (2023)
Cite this article

Download PDF

You have full access to this open access article

Water, Air, & Soil Pollution Aims and scope Submit manuscript

Data Quality in IoT-Based Air Quality Monitoring Systems: a Systematic Mapping Study

Download PDF

Julio Buelvas¹,
Danny Múnera ORCID: orcid.org/0000-0003-0762-0571¹,
Diana P. Tobón V.²,
Johnny Aguirre² &
…
Natalia Gaviria¹

3692 Accesses
4 Citations
2 Altmetric
Explore all metrics

Abstract

With the development of new technologies, particularly Internet of Things (IoT), there has been an increase in the deployment of low-cost air quality monitoring systems. Compared to traditional robust monitoring stations, these systems provide real-time information with higher spatio-temporal resolution. These systems use inexpensive and low-cost sensors, with lower accuracy as compared to robust systems. This fact has raised some concern regarding the quality of the data gathered by the IoT systems, which may compromise the performance of the environmental models. Considering the relevance of the data quality in this scenario, this paper presents a study of the data quality associated with IoT-based air quality monitoring systems. Following a systematic mapping method, and based on existing guidelines to assess data quality in these systems, we have identified the main Data Quality (DQ) dimensions and the corresponding DQ enhancement techniques. After analyzing more than 70 papers, we found that the most common DQ dimensions targeted by the different works are accuracy and precision, which are enhanced by the use of different calibration techniques. Based on our findings, we present a discussion on the challenges that must be addressed in order to improve data quality in IoT-based air quality monitoring systems.

CEN/TS 17660 in Air Quality Systems for Data Quality Validation and Certification over Smart Spot Air Quality Systems

Air Quality Monitoring System and Benchmarking

Indoor Air-Quality Monitoring Systems: A Comprehensive Review of Different IAQM Systems

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The need to monitor air quality variables has increased in the last decade, due to the high levels of pollutants that affect human health, especially in large urban areas World Health Organization Regional Office for Europe (2017). This fact has led to the development of different types of monitoring systems, which are considered a cornerstone in the implementation of strategies to mitigate pollution. The goal of these systems is to monitor air quality variables, in order to provide authorities and citizens with important information about the current level of gases and particles in different areas of the city. This information can be used to take decisions aimed at preventing the negative impact of these pollutants on human health. With the aim of utilizing the monitoring data to assess, predict and reduce the pollutant levels, the environmental agencies have developed regulations that include key aspects, such as data quality objectives and indicators that must be accomplished (UNION et al., 2008; EPA, 2017).

Traditionally, air quality monitoring systems consist of a set of expensive robust stations that require on-site calibration and maintenance. Due to the high cost, the number of monitoring stations is usually low, leading to a low spatial resolution of the data (Röösli et al., 2000; Lin et al., 2020b). In the past few years, however, the development of the monitoring systems has come hand in hand with the development of new technological paradigms such as the Internet of Things (IoT), thus allowing the deployment of a larger number of air quality monitoring systems (Múnera et al., 2021). IoT is a paradigm of systems that allows connectivity and information exchange between heterogeneous objects (uniquely identifiable), in order to capture and process information ubiquitously for decision-making and action on a given context (Atzori et al., 2017). Hence, IoT-based air quality monitoring systems use low-cost sensors, thus enabling the massive development of sensor systems with lower associated costs, and allowing permanent and real-time access to the gathered data.

The data generated by IoT systems, however, has been considered unreliable for two main reasons. On one side, they utilize low-cost sensors, which lack the accuracy and precision of the robust stations. The second reason stems from the fact that these systems are exposed to many endangering factors, since their applications usually involve wide deployments and open platforms (Karkouch et al., 2016; Liu et al., 2019). These conditions have led to a significant concern regarding the data reliability and trustworthiness of the IoT-based monitoring systems. Particularly, in the context of air-quality monitoring, several researchers argue that the use of low-cost sensors is generating unreliable data (Kumar et al., 2015; Castell et al., 2017; Manikonda et al., 2016). This situation poses a new challenge in the context of smart cities and IoT: it is necessary to assess and improve the quality of the data obtained through the IoT systems, in order to establish their reliability and provide useful information to decision makers.

The study of Data Quality (DQ) emerged from the field of information systems, where large amount of data are needed to be stored in databases and managed by such information systems. Authors in Wang (1996) proposed a set of dimensions that were more important for data consumers in this field. Because of the importance of data, this concept has been adopted by other applications and fields. Specifically, in the context of IoT systems, the analysis of DQ has become relevant in order to guarantee the reliability of the data to the decision makers. Authors Liu et al. (2019) and Karkouch et al. (2016) have both conducted a systematic literature review and a state-of-the-art review of DQ in IoT, and have discussed how DQ has been addressed in IoT applications. They have also identified the challenges and most prominent research sub-fields of DQ in IoT, which include the most commonly used dimensions, endangering factors, and enhancing methods.

DQ analysis in the field of air quality monitoring systems is a fairly new topic, since massive low-cost systems have become popular in the past few years. Even though there are specific definitions for the DQ expected out of these systems, provided by the EPA (2017) and the EU (UNION et al., 2008), the studies on this topic are limited to the dimensions addressed by the deployed solutions, and do not consider the relationship between the DQ dimension and the indicator suggested by the standardization entities. In this context, this study aims at providing an overview of how DQ has been addressed in the implementation of IoT-based air quality monitoring systems. Moreover, the goal is also to find the relationship between the Data Quality Indicators (DQI) and Data Quality Objectives (DQO) defined by EPA, and the DQ dimensions traditionally used. Our contributions are hence summarized as follows.

We review and analyze the existing guidelines for assessing DQ in air quality monitoring systems for proposing a mapping between the DQ indicators (from guidelines) and the DQ dimensions (from DQ field).
We analyze the main DQ enhancement techniques and identify how these techniques affected the DQ dimensions.
We develop a systematic mapping study to determine the state of the evaluation of DQ in IoT-based air quality monitoring systems. We use our proposed mapping between DQI, enhancement techniques and DQ dimensions to answer the research questions that guide our systematic mapping study.
We highlight some challenges that must be addressed in order to improve data quality in IoT-based air quality monitoring systems.

The document is organized as follows. Section 2 describes data quality principles. Section 3 presents data quality in the context of air quality, where a description of data quality indicators and objectives is discussed. Section 4 highlights the most common DQ enhancement techniques used in IoT-based air quality monitoring systems. Section 5 shows the steps for the systematic review process. Section 6 describes the results found in the systematic mapping study. Finally, Sections 7 and 8 present the discussion and conclusions, respectively.

2 Data Quality Principles

It is common to find a definition of DQ from the consumer’s point of view, where this trend is based on the treatment of data as a product. In Wang (1996), it is defined as “data that are fit for use by data consumers”; similar definitions are found in Karkouch et al. (2016) and Liu et al. (2019). According to Karkouch et al. (2016), the data consumer requires data to fulfill certain criteria that are essential for the tasks at hand. Being data a product, DQ is a multi-faceted concept since users have different expectations out of it. Thus, the DQ analysis has been divided into dimensions, where each dimension stands for an attribute that is important to the data consumer, or the application. After studying the term DQ in the field of IoT, we have identified several dimensions that can be relevant to the analysis of DQ.

Tables 1, 2, 3, 4, 5, and 6 present the most relevant DQ dimensions as well as their definitions and proposed evaluation metrics. In these tables, the first column is dimension name, and the second column includes a short definition of the dimension, which is a result of the review of several sources. It can be evidenced how a dimension can take several names, but it will have the same definition over different sources. Finally, the third column shows a formula or metric to evaluate each DQ dimension (DQ_dimension value), and each of them has been adapted such that every value is in the range between 0 and 1 (0 for low quality and 1 for high quality).

Table 1 DQ dimensions related to data values

Full size table

Table 2 DQ dimensions related to the amount of data

Full size table

Table 3 Time-related DQ dimensions

Full size table

Table 4 DQ dimensions relation between data and context

Full size table

Table 5 DQ Dimensions related to the system

Full size table

Table 6 DQ dimensions relation between data and context

Full size table

These dimensions can be classified according to different categories. Table 1 shows the dimensions related to the specific value of the data (and its error). These dimensions include as follows: Precision, Accuracy, and Confidence. A second category of DQ dimensions is presented in Table 2, where the amount of data is considered. This category includes the Data volume, Completeness, and Redundancy dimensions. The third category gathers the time-related DQ dimensions as presented in Table 3. This category includes the Timeliness and Accessibility dimensions. Table 4 shows the dimensions that take into consideration the relationship among the data, such as Concordance, Artificiality, and Interpretability. Finally, Table 5 presents the last category, which considers the dimensions that are related to the system, specifically Utility, Trust, and Access security.

As stated earlier, the relevance of each dimension depends on the specific application of the system and on how the data is going to be utilized. In that sense, a unique DQ value has not been defined in order to decide whether a data should be used. The validity dimension, however, aims at providing the system with the flexibility to define which dimensions are relevant to the DQ of the specific context (see Table 6).

3 Data Quality in the Context of Air Quality Estimation

In the context of air quality monitoring systems, The European Parliament And The Council has established Data Quality Objectives and Data Quality Indicators in the DIRECTIVE 2008/50/EC (UNION et al., 2008) guideline, while the Environmental Protection Agency (EPA) in the USA proposed the Quality Assurance Handbook for Air Pollution Measurement Systems (EPA, 2017) guideline. These documents define Data Quality Objectives (DQO) as the level of accepted threshold of the Data Quality Indicator (DQI), i.e., attributes of data quality. A close examination of these guidelines can lead to identify and match some of these indicators to the DQ dimensions previously discussed. We present below each DQI and its relation with the DQ dimensions.

Uncertainty: According to JCGM (2008), it is “a parameter associated with the result of a measurement that characterizes the dispersion of the values that could be reasonably attributed to the measurand.” The authors also state that uncertainty is a generic term used to describe the sum of all sources of error associated with an environmental data operation. Uncertainty has two components, namely population uncertainty and measurement uncertainty. The former is related to the representativeness of the sample, while the latter is related to the precision, bias, and detection limit (EPA, 2017).Regarding the DQO for particulate matter pollutants, the maximum allowed uncertainty for fixed measurements (i.e., robust monitoring stations) is 25%, while for indicative measurements (e.g., low-cost sensors measurements) is 50% (UNION et al., 2008). Based on this definition, this indicator is related to accuracy and confidence dimensions.
Minimum data capture: It has a limit of 90%, which means that the maximum number of missing values within one measurement period is 10% of the expected values (UNION et al., 2008). This indicator is related to completeness dimension.
Minimum time coverage: This indicator for measurements of pollutants such as particulate matter (PM10/PM2.5) has a limit of 14% (1-day measurement per week at random, evenly distributed over the year, which would result on roughly 52 1-day measurements per year, or 8 weeks evenly distributed over the year, which would result on roughly 56 1-day measurements per year) (UNION et al., 2008). This indicator is related to timeliness and completeness dimensions.
Minimum number of sampling points: This indicator is defined in UNION et al. (2008) for fixed measurements, and it depends on the population of the specific area. For instance, a zone such as the Aburra Valley in Antioquia-Colombia, with about 4 million inhabitants in 2020 (Proantioquia et al., 2020), requires a minimum number of sampling points of 11. This indicator is related to data volume dimension.
Precision: It represents the random component of error and is a measure of agreement among repeated measurements of the same property, under identical or very similar conditions (EPA, 2017). It is usually estimated as a derivation of the standard deviation. This indicator is part of the uncertainty components and matches the precision DQ dimension.
Bias: This indicator is a component of the uncertainty and represents the systematic distortion of a measurement process that causes error in one direction. It is determined by the estimation of positive and negative deviation from the true value (EPA, 2017). This definition matches the accuracy DQ dimension.
Detection limit: It is the minimum concentration of a pollutant that can be distinguished from zero (absence of the pollutant) by a single measurement at a stated level of probability (EPA, 2017). This indicator can be sorted within the validity DQ dimension.
Accuracy: It is defined as data quality indicator in EPA (2017) as “measure of the overall agreement of a measurement to a known value and includes a combination of random error (precision) and systematic error (bias) components of both sampling and analytical operations.” The guide recommends to use bias and precision when possible, otherwise, use accuracy as the measurement uncertainty. This indicator matches the dimension of the same name.
Representativeness: In handbook (EPA, 2017), it is defined as a measurement of the population component of uncertainty and refers to “the degree to which data accurately and precisely represents the frequency distribution of a specific variable in the population”. According to the guide, it does not matter how precise or unbiased the measurement values are, whether a site is unrepresentative of the population that is presumed to represent. Representativeness depends on factors such as the amount of sampling points (network size), frequency of sampling, and sampling schedule. Thus, this indicator can match timeliness and data volume DQ dimensions, as well as the “minimum number of sampling points” and “minimum time coverage”, which are discussed in the guide (UNION et al., 2008).
Comparability: In the EPA handbook (EPA, 2017), this indicator is defined as “a measure of the confidence with which one dataset or method can be compared to another, considering the units of measurement and applicability to standard statistical techniques”. For example, if there are two datasets retrieved from monitoring stations and low-costs sensors, it is expected that both of them are comparable. This indicator can match the concordance DQ dimension.
Completeness: This indicator (from EPA (2017)) directly matches definition of the data completeness DQ dimension as the ratio of obtained valid data to the expected data. EPA requires 75% data to be complete.

4 DQ Enhancement Techniques

This section describes the most used data quality enhancement techniques in IoT-based air quality monitoring systems. We found four main categories, namely Data calibration, Data Interpolation, Data aggregation/fusion, and Outlier Detection as described to follow.

4.1 Data Calibration

Low-cost sensor calibration is essential due to collected data can be affected by noise and abnormalities. However, sensor manufacturers do not often provide direct means of sensor calibration, since it is not intended for low measurements, and is under specific humidity and temperature settings (Hasenfratz et al., 2012). Moreover, a calibrated sensor can suffer of sensor drift due to it can last several years after deployment (Barcelo-Ordinas et al., 2018). Hence, automatic or additional calibration is a needed in order to overcome the mentioned limitations. Common calibration approaches for low-cost air quality sensors are done in laboratory with artificial pollutants, as well as in field, where the sensors are located close to fixed reliable stations. Field calibration has the disadvantage of dependency on weather conditions. Therefore, different reference measurements with several weather conditions (e.g., temperature and humidity settings) are needed for a more accurate calibration process (Hasenfratz et al., 2012).

Existing works have proposed new approaches based on traditional calibration. A node-to-node calibration approach was proposed in Kizel et al. (2018). It consists in calibrating only one sensor in a chain, by using reference measurements. Then, the rest of sensors are calibrated sequentially one against the other. This approach is suitable for distributed sensor networks. Other work uses Simple Linear Regression (SLR), Multi Linear Regression (MLR), and Artificial Neural Network (ANN) for calibration (Okafor et al., 2020). One feature (i.e., measurements from one sensor) is used in a SLR model, where each sensor is calibrated individually to adjust the bias. On the other hand, MLR and ANN models use all available features and a subset of features found by an Exhaustive Feature Selection method. Another approach is to place the sensor to be calibrated and the reference sensor in a hardboard box as in Rajasegarar et al. (2014b). The authors performed cubic polynomial fit with minimum error. A similar procedure was performed in Carratu et al. (2020), where a particle generator was used, and the sensors were previously synchronized. The authors also used the cubic polynomial fitting for each sensor.

4.2 Data Interpolation

Data interpolation can be understood as the process to generate new data with the aim to improve spatial or temporal resolution of a variable under supervision. Air-quality monitoring at local scale requires spatio-temporal integration to interpolate data. Urban environments can have large variations at small scale, where traditional interpolation methods fail to obtain reliable data. A solution is the use of high-density networks, by using low-cost sensors in order to monitor variable data at local scale (Alavi-Shoshtari et al., 2013). Low-cost sensors offer finer resolution of spatio-temporal data, which can complement existing air-quality monitoring stations. However, in order to address data quality from low-cost sensors, several interpolation methods have been proposed. Spatial interpolation is a common method used to predict spatio-temporal distributions in outdoors. Spatial interpolation relates air-quality measurements to their locations in order to predict point-wise data. It increases data availability across space and time. Existing spatial interpolation algorithms include nearest neighbor, spatial averaging, inverse distance weighting, and Kriging. The most used is the Kriging method, which produces best linear unbiased estimation of air-quality data (Li et al., 2018).

4.3 Data Agregation/Fusion

Data generated by several low-cost sensors can have uncertainties since various sensors have different technical performance. Data from only one sensor cannot satisfy the needs in terms of resolution and accuracy. Hence, accurate measurements can be obtained when data from different sensors (i.e., a multisensor system) is fused (Lin et al., 2020a). Data fusion was defined by Joint Directors of Laboratories in 1991 as “a the process of dealing with the association, correlation, and combination of data and information from single and multiple sources to achieve refined position and identity estimates, and complete and timely assessments of situations and threats as well as their significance” (White, 1991).

Data fusion systems have the advantage of expanding coverage in terms of space and time, as well as to improve performance, and spatial-temporal resolution (Lin et al., 2020a). Calibration errors can be reduced by considering measurements from several sensors and multivariate regression. It helps reducing uncertainty of calibration parameters (Barcelo-Ordinas et al., 2018). Existing works have proposed data fusion methods. For example, a data-fusion framework based on Optimum Linear Data Fusion theory (based on the least squares method) and Kriging method (to estimate the spatial-temporal data) was proposed in Lin et al. (2020a). Another approach merge sensor data with environmental factors in a calibration equation by using linear regression and artificial neural networks (Okafor et al., 2020).

4.4 Outlier Detection

Outliers refer to those data points that are far from the expected pattern in the data, and cannot be explained by a model. They are not consistent in space and time with the remaining set of observations (Jain and Shah, 2017), and must be distinguished from noise and missing values (Chung & Kim, 2020). Outliers are also known as anomalies, and can be caused by interference and sensors malfunction. Increasing IoT applications has imposed the need to develop accurate outlier detection methods since there is an increase in the collected data. In addition, several IoT applications use low-cost sensors, which are more prone to outliers with a high variability. Hence, it has been reported that low-cost sensor data suffers of low accuracy and precision, as well as low correlation with the reference (Fang & Bate, 2017b). Removing outliers from datasets at preprocessing stages improves the performance of machine learning algorithms (Chung & Kim, 2020). Several approaches have addressed outlier detection by using threshold values, where if a data point is greater than that value, it is considered an outlier. However, the threshold selection is an issue since it can be very subjective (Chen et al., 2018). Several solutions for outlier detection have been studied such as density-based, distance-based, and neural network-based methods (Huang et al., 2020b).

4.5 Relation Between DQ Enhancement Techniques and Dimensions

Figure 1 depicts the relationship between air quality indicators, data quality dimensions, and data quality enhancing techniques. This mapping helps to identify which air-quality sensor data attributes are related to the IoT data quality ones, as well as which techniques are commonly used to improve those attributes. For example, the accuracy can be improved by sensor calibration, data interpolation, outlier detection, and data fusion techniques. As evidenced, not all dimensions have a mapping on the indicators side, and not all dimensions can be improved by the DQ enhancing techniques explored in the literature. A description of how the enhancement techniques are related to the DQ dimensions is explained as follows.

Data calibration involves correcting the taken measurements to improve the accuracy of the variable. Thus, the accuracy dimension is directly affected as the calibration process seeks to improve this dimension. When an air quality monitoring system implements a calibration process, it enhances the confidence of the system, thus affecting the confidence dimension. If the calibration process takes into account the variability of the measurement, the precision dimension is also improved. On the other hand, if the calibration process involves an additional reference measurement (e.g., robust calibrated sensors, particle generators), the concordance dimension is affected since the sensor, which is being calibrated, is compared to a reference for applying a correction mechanism.

Data interpolation creates new data points in order to fill spatial or temporal gaps, by improving the completeness of the original samples as well as the volume of captured data. Normally, interpolated data is created using mathematical or machine learning models, thus increasing the artificiality. Interpolated data is usually compared with at least one reference in order to estimate the error, and hence the accuracy dimension is altered. Precision dimension is affected if a computation of variability or standard deviation of the interpolated data is evaluated. Data interpolation is also related to the confidence dimension if confidence intervals are calculated and the interpolated data is within them. When a computation of correlation between interpolated data and near (spatial or temporal) real data is performed, the concordance dimension is considered.

Data aggregation/fusion techniques are related to several data quality dimensions. Accuracy of measurements is improved when data with poor quality are fused or aggregated with good quality data. The fused or aggregated data can have a different variability from the sources depending on the used technique, thus affecting the precision and increasing data artificiality. Completeness dimension is altered if new data is included in the fused or aggregated process. Also, new data contributes to data volume dimension, where incomplete datasets can be merged in order to obtain a more complete fused dataset. Moreover, the redundancy dimension is changed if the aggregation technique uses redundant data and the confidence dimension is affected if an error is estimated with a specific confidence interval. Additionally, concordance dimension is modified if the techniques include the correlation of multiple measurements. Finally, the validity dimension is altered if fused or aggregated data is contrasted with ground true data.

As outlier detection techniques aim to identify data that is not consistent with other observations, it can reduce the error and variability of data, by improving its accuracy, precision, and confidence, while increasing its reliability. Furthermore, if outlier detection involves removal of anomaly data, it will impact directly on the completeness of the dataset and will also reduce its volume. Detecting whether anomalous data are related to errors or important events can also be achieved by using concordance metrics; hence, this dimension is related to the technique. Having these DQ concepts in mind, we present below the design and the results of the systematic mapping proposed in this work.

5 Systematic Mapping Method

A systematic mapping study is a well organized, and a frequently used methodology to synthesize the state of the art around a particular research area. This type of studies looks for the “big picture” of some particular research topic, showing the branches and challenges associated with it James et al. (2016). This approach has been mainly used in software engineering; however, its application in the IoT field has been modest.

In this document, a systematic mapping study is developed based on the guidelines proposed by Petersen et al. (2008). Some steps were established to identify and analyze the studies about Data Quality on IoT-based air quality monitoring systems. We define the following steps for developing the systematic mapping study:

1.
Research questions: In this step, the research questions are defined. These questions are expected to be solved when the systematic mapping process is completed.
2.
Search strategy: This step defines the methodology of the research, starting by defining the “search chain” which will be applied to relevant academic databases.
3.
Selection criteria: Inclusion and exclusion criteria are defined in this step. These criteria are used to filter the studies found in previous step.
4.
Data extraction: Once the Search Strategy and Selection Criteria are applied, relevant information about the Research Questions is extracted from the selected articles.
5.
Analysis: In this step, we analyze the results obtained for drawing conclusion about the mapping study.

5.1 Research Questions

We develop this study to identify the state-of-the-art on how data quality is applied in IoT-based air quality monitoring systems. Hence, we define five research questions (RQs) which help us to guide the review of the literature in this field.

RQ#1: Which are the most relevant DQ dimensions related to IoT-based air quality monitoring systems?
RQ#2: What are the most used strategies to mitigate data quality problems in IoT-based air quality monitoring systems?
RQ#3: What are the system’s features that threaten data quality in IoT-based air quality monitoring systems?
RQ#4: How is data quality estimated for IoT-based air quality monitoring systems?
RQ#5: How is degradation of data quality identified in IoT-based air quality monitoring systems?

5.2 Search Strategy

The research questions are used to identify the four main keywords in our search: “Air Quality,” “Monitoring,” “Data Quality,” and “Internet of Things.” Then, we assemble the search query including new terms from variations of these keywords. Table 7 presents the search query, for each main keyword we define a corresponding query that contains all the variants. We used the AND logical operator to connect the resulting keyword groups.

Table 7 Search query used in the mapping study

Full size table

According to the analysis developed in Chen et al. (2010), we select five of the most relevant academic databases: IEEE, Web of science, Scopus, ACM, and Science Direct. We performed the search in March 2022 using the query described in Table 7 in the title, abstract, and keywords of the published works. We found a total of 162 publications, after removing duplicates.

We also developed a snowballing process from the review articles found in the initial search. The idea is to look for potential papers to include in our study by reviewing the references of these review articles. We identified 40 papers in this snowballing process.

5.3 Selection Criteria

For this study, we define one inclusion criterion and four exclusion criteria. The inclusion criterion defined for this study is “the study includes publications that propose, compare or implement methods to measure or analyze the quality of data gathered by IoT systems in the context of air pollution.”

The exclusion criteria for this mapping study are the following: (1) The study excludes papers that do not propose, compare, or implement methods to measure or analyze the quality of data gathered by IoT systems in the context of air pollution. (2) The study excludes papers that are not written in proper English language. (3) The study excludes papers that are duplicated or are a previous version of a more complete study about the same research. (4) The study excludes papers such as systematic reviews, mapping studies, editorials, prefaces, article summaries, interviews, news, correspondence, discussions, comments, readers letters, tutorial summaries, panel discussions, opinion articles, poster sessions, classes, abstracts, and presentations.

We apply the inclusion/exclusion criteria to the papers retrieved in the previous step, by ensuring that each paper is analyzed by all members of the team. We develop meetings to resolve the conflicts arisen from the application of these criteria. The Rayyan web application is used for managing this process (Ouzzani et al., 2016). As a result of this step, 71 papers were selected (see Table 8).

Table 8 Included and excluded publications

Full size table

5.4 Data Extraction

In this step, we deeply review the selected papers with the aim to extract relevant information for answering the research questions. As in Petersen et al. (2015), we divide the selected papers into five sets of 14–15 papers. Each team member extracts information from the papers in her/his set and then reviews the extraction of another team member. Following this process, we ensure that each paper is review by two team members. Then, a weekly meeting is carried out to resolve any conflict and reach a common agreement.

6 Results

This section presents the results of the mapping study developed to answer the research questions stated above. Before discussing the main results, we present a general overview of the papers under scope. The analysis of the main topics are presented around three aspects such as DQ dimensions and enhancement techniques, endangering factors, and DQ estimation and degradation.

One of the first highlighting points is that the analysis of data quality in the context of IoT-based air quality monitoring systems is a topic with rising interest in the research community, especially in the last 7 years, with an average number of near 9 papers per year, as shown in Fig. 2. Even though, there are some early approaches, such as by Harkat et al. (2006), the interest in DQ can be linked to the development and deployment of low-cost monitoring systems.

Figure 3 illustrates the venues in which the analyzed works were published. Most of the papers (57.7%) were published in high quality journals (Q1 or Q2 according to the Scimago ranking). Almost a third of the papers analyzed in this study (31%) were published in conferences. Figure 4 shows the deployment location of the AQ monitoring systems for which DQ is analyzed. These systems have been deployed in 16 different countries, being USA (with 10 AQ systems), China and Taiwan (with 7 systems each one), and Switzerland (with 4 systems) the countries with more number of deployments reported.

Figure 5 presents some details regarding the IoT-based AQ systems. Most of the systems are specifically created for outdoor monitoring (53 out 71), while 5 works are created for indoor scenarios, and 7 for both indoor and outdoor. We also analyzed the portability of these systems finding 44 implementations in fixed locations, 12 mobile system, and 4 works that can be used in both fixed and mobile.

Regarding the variables of interest in the AQ monitoring system, Fig. 6 presents a histogram of the environmental variables identified in our study. The PM2.5 variable is the most frequently analyzed followed by the ozone (O3) and the nitrogen oxides (NOx). This result is in agreement with the expectations, since low-cost PM and gas sensors are more prone to low-quality measurements as mentioned before.

6.1 DQ Dimensions and Enhancement Techniques

This section aims at providing answer to research questions RQ#1 and RQ#2. Regarding RQ#1, “Which are the most relevant DQ dimensions related to IoT-based air quality monitoring systems?,” most of the works analyzed usually do not refer directly to DQ dimensions, as defined in Section 2 We consider this lack of use of technical DQ concepts in IoT systems is caused by the disconnection between the IoT field and the Data Quality theory.

We identify the DQ dimension used in IoT systems by looking for the DQ enhancement techniques implemented in those systems, thus answering the RQ#2. Figure 7 presents the DQ enhancement techniques implemented in the analyzed works. Calibration (C) is the most used technique being implemented in about 50% of the works. Most of them implement calibration techniques on-site and at run-time as depicted in Fig. 8. Data interpolation (I) and outlier detection (O) are also frequently used in air-quality monitoring systems, being implemented in 18 works each. Finally, data aggregation and fusion are less frequently implemented, found in only four works.

According to the discussion we develop in Section 4.5, the calibration technique is directly related to the accuracy, confidence, precision, and concordance dimensions. Furthermore, the data interpolation is related to the completeness, artificiality, accuracy, precision, confidence, and concordance dimensions. Outlier detection is associated to the following dimensions, accuracy, precision, confidence, completeness, and concordance. Finally, the data aggregation and fusion techniques are linked to the accuracy, precision, artificiality, completeness, data volume, data redundancy, confidence, concordance, and validity dimensions.

Figure 9 presents the percentage of relative importance of DQ dimensions in IoT-based air quality monitoring systems. We define a score for representing the relative importance of a dimension, which varies from 0 to 100, where 0 means the dimension is not important, and 100 means the dimension is very important. This score is computed as the percentage of appearances of each dimension with respect to the total number of times an enhancement technique is implemented. According to this score, the Precision, Confidence, Concordance, and Accuracy dimensions, with a score of 100, are been considered the most important DQ dimensions for the IoT-based air quality monitoring systems. Then, Completeness and Artificiality dimensions have a lower importance, obtaining scores of 52 and 29. Finally, the least important dimensions are Validity, Data Volume, and Data Redundancy, with a score of 6.3 each one.

6.2 Endangering Factors

IoT-based air quality monitoring systems have been gaining popularity and are being included in a lot of new applications. Features like portability, small size, lightweight, low-cost, and first-hand data generation have motivated the creation of enthusiastic projects related to this topic. For this reasons, the trend shows that this approach will continue growing in the next decade. Figure 10 shows that most of the reviewed works (85.9%) are using low-cost sensors.

New technological approaches around air quality measurement have brought new challenges related to the degree of trust of these systems. IoT-based air quality monitoring is somehow contrary to classic, expensive, robust and certified air quality monitoring stations, which have been used as normative to determine the risks associated with air quality in overpopulated places around the world.

Moreover, some weaknesses are related to low-cost stations and their application to large-scale air quality monitoring. In particular, DQ can be seriously compromised in low-cost approaches, due to some degree of data degradation mentioned in the RQ#3. Among the weaknesses of low-cost sensors, we could identify: method of measurement, sensor aging, lack of redundancy, limited lifetime, and data error in storing/communication. The details of each identified weakness are given below.

Method of measurement. Most of low-cost portable AQ sensors are related to PM2.5/PM10 and gas concentration (Budde & Riedel, 2018; Lin et al., 2018) measurements. In less proportion, other kind of gas sensors are used in IoT air-quality monitoring applications (Fig. 10). These sensors implement a widely used technique called “laser detection.” In this technique, a flow of air is pumped by a fan into a chamber. The chamber has a laser light which generates shadows on a light detector when a particle is present. Constant air flow is essential to have an accurate reading. An embedded computer attached to the sensor estimates the PM value based on the light detector. Problems associated with this method include as follows: miss-computations in the detector, low or high air flow speed in the chamber, high or low environmental temperatures, and high humidity in the air around the sensor (Liu et al., 2017; Penza, 2020).
Sensor aging. A high concentration of dirt and dust degrades the sensor response, generating data that can be far from the reality (Liu et al., 2017; Manikonda et al., 2016). Periodical maintenance has to be applied in order to avoid this issue. Regarding outdoor sensors, they suffer from case degradation. A lot of malfunctions in such sensors are related to electronic damage due to water leaking inside the sensor hardware. The sensor’s precision decreases along time, environmental factors as humidity and temperature can seriously degrade the data. A periodical calibration strategy must be applied to improve this issue.
Lack of redundancy. Redundancy is an easy way to determine when a particular sensor is showing malfunctions. In IoT AQ monitoring systems, sensor redundancy comprises including two or more sensors for comparing the deviation of their readings. Moreover, some applications can show serious problems applying redundancy of nodes due to their size and battery restrictions. Another kind of redundancy can be achieved by analyzing spatio-temporal data from near sensors (Feinberg et al., 2019; Li et al., 2018; Lin et al., 2020a). This technique is a computational demanding task that usually has to be applied in a gateway node and, again, can be hard to accomplish in some scenarios.
Limited lifetime. Battery powered systems are widely used in low-cost electronic solutions, and AQ measurement system is not an exception. The typical AQ device includes a PM sensor, an on board computer, and communication and storage interfaces. With all these components demanding power from the battery. In a poorly planed AQ IoT solution, the battery lifetime can be very short due to system inefficiency (Penza, 2020). Power management design techniques as low power communications, extensive use of low power modes in the processors, time/event driven software applications development, among others, have to be applied in order to extend the lifetime of the batteries (Kendrick et al., 2019).
Data error in storing/communication. It is common in electronic systems to have errors in communication and storage processes. Those errors can be caused by the electromagnetic noise in the environment, battery degradation, PCB malfunctions, exceeding of memory capacity, among others (Budde and Riedel, 2018; Kaivonen & Ngai, 2020). A sloppy software implementation can be a source of errors. Those malfunctions are usually hard to find in the design process, even for an experienced engineer. A well-planned set of tests has to be applied to the hardware and software in order to minimize the probability of these malfunctions (Kendrick et al., 2019).

6.3 Data Quality Estimation and Degradation

For answering RQ#4 (How is DQ estimated?) and RQ#5 (How is degradation DQ estimated?), we conclude from our review that DQ is estimated based on its dimensions, which, at the same time, allows to see whether there is any degradation. For example, in AQ monitoring systems, a degradation can be identified when an indicator is not within the limits given by the Data Quality Objectives (DQOs), and it is common to find that authors define what indicators are important to them and their own thresholds. According to international guidelines (EPA, 2017; UNION & et al., 2008), the quality of data in air quality monitoring systems should be estimated based on the Data Quality Indicators (DQI), as we reviewed in Section 4 However, most of the studies focus only on the accuracy, precision, concordance, and confidence of data, as shown in Fig. 9.

For the evaluation of DQI related to the aforementioned dimensions, it is necessary to have a reference measure. In the studies included in our review, we found a wide variety of references (see Fig. 11). The reference measure more frequently used are city-scale stations, since these stations are devices that are often calibrated and can provide a reliable measure. Other references for calibration in laboratory have been found, such as using calibrated sensors or calibration chambers. For on-site calibration, using neighboring sensors as reference is a convenient option. Finally, we identified other methods like using historical data from public available datasets, and the use of the measurements of a NASA aircraft which collected AQ measurements in the regions of interest (Duvall et al., 2016).

Even though, other DQ dimensions and indicators are barely mentioned and estimated, authors clearly define the need of using low-cost sensors to increase the spatial and temporal resolutions of the system, which implies enhancing DQ indicators such as the representativeness, the minimum data capture, the minimum time coverage, and the minimum number of sampling points. It means that, besides the accuracy of data, authors are indirectly affecting the degradation of other indicators that impact on the overall application’s DQ. The treatment of DQ by several authors is not stick to any concepts or definitions, for instance, terms like DQ estimation and DQ degradation are not separated and there is not a clear distinction about when DQ started to degrade, but just calculated, and sometimes compared to thresholds within the same process.

6.4 Citations

The following works were included in this systematic mapping study: Xie et al. (2019), Tang (2016), Saukh et al. (2015), Ma et al. (2020), Okafor et al. (2020), Huang et al. (2020a), Kim et al. (2018), Kizel et al. (2018), Fang and Bate (2017b), Alavi-Shoshtari et al. (2013), Bart et al. (2014), Harkat et al. (2006), Heimann et al. (2015), De Vito et al. (2008), Castell et al. (2017), Duvall et al. (2016), Hasenfratz et al. (2012), Alvarado et al. (2015), Moltchanov et al. (2015), Kelly et al. (2017), Rajasegarar et al. (2014b), Piedrahita et al. (2014), Sun et al. (2016), Talampas and Low (2012), Weissert et al. (2017), Mead et al. (2013), Chen et al. (2018), Nguyen et al. (2019), Yuan et al. (2016), Jain and Shah (2017), Chung and Kim (2020), Lee et al. (2019), Alavi-Shoshtari et al. (2018), Barcelo-Ordinas et al. (2018), Lin et al. (2018), van Zoest et al. (2019), Wang et al. (2015), Jiao et al. (2016), Mueller et al. (2017), Kendrick et al. (2019), Rai et al. (2017), Feinberg et al. (2019), Rajasegarar et al. (2014a), Fang and Bate (2017a), Wang et al. (2020), Hao et al. (2015), Benabbas et al. (2019), Fu et al. (2017), Kotsev et al. (2016), Markert et al. (2016), Kaivonen and Ngai (2020), Harrou et al. (2018), Maag et al. (2017), Wang et al. (2017), Li et al. (2018), Qin et al. (2020), Lin et al. (2020a), Orlowski et al. (2019), Carratu et al. (2020), Buelvas et al. (2021), Chu et al. (2020), Connolly et al. (2022), Cui et al. (2021), Hofman et al. (2020), Hofman et al. (2022), Li et al. (2022), Lin et al. (2022), Marathe et al. (2021), Qiao et al. (2021), Rezapour and Tzeng (2021), Rivera-Munoz et al. (2021), Rollo and Po (2021), and Van Zoest et al. (2021).

7 Discussion

An increasing interest in analyzing the DQ topic is depicted in Fig. 2, which can be interpreted as the result of the number of low-cost IoT systems deployed for AQ monitoring (see Fig. 10). However, it was found in the reviewed papers that not many authors make an explicit mention of the DQ dimensions addressed in their work. What they do is to mention terms derived from “Data Quality,” where DQ information is diffuse. We believe that the main reason for this phenomenon is the language of data quality, which has not been used in a proper formal way inside the IoT and air-quality monitoring applications yet. This is considered a serious issue to confront AQ measurement under the DQ definitions here presented.

Accuracy was the most DQ indicator mentioned by different authors to measure “quality” inside AQ systems. Nevertheless, the introduction of other indicators will provide a more reliable and realistic approach inside the IoT AQ measurement. The minimum DQ dimensions and indicators that should be provided by a low-cost AQ system is a challenge that has to be established by different actors such as environmental agencies, enthusiastic developers, and technological industries around the world. We consider that environmental agencies have shown resistance in the implementation of portable and low-cost AQ supervision systems due to factors such as method of measurement, sensor aging, lack of redundancy, limited lifetime, and data error in storing/communication. Although these problems are serious and unresolved, low-cost AQ supervision will not be taken into account as a real alternative to determine the AQ in large-scale applications. On the other hand, low-cost sensors in the context of AQ applications have being growing as an alternative to empower citizens around the world. This tendency offers a lot of challenges and opportunities, which remarks the importance of an adequate DQ definitions in those applications.

Using DQI or DQ dimensions as a way to evaluate the status of an air-quality monitoring system can be a proper approach, since it will consider the attributes that are really important for the users within a context. This approach can provide a complete view of the system’s DQ status, and also allow to check on specific degraded features that can be improved by using DQ enhancing techniques (see Fig. 1). Also, by identifying the endangering factors, it can be targeted improvements on the system’s infrastructure to mitigate their impact on the overall DQ of the system.

Therefore, this work found that dimensions or indicators are not mentioned explicitly by the authors due to the lack of proper usage of DQ dimensions and indicators definitions, as well as the fact that most of the authors do not stick to guidelines, which standardize topics like the air-quality monitoring. In order to mitigate this issue, as a future work, we propose the development of a tool that can be used to identify and sort the dimensions and indicators for IoT-based AQ monitoring systems.

8 Conclusion and Future Work

In this paper, we studied the data quality analysis on IoT-based air quality monitoring systems. First, we identified a general overview of data-quality dimensions within an IoT context. Then, data quality indicators and objectives in air-quality monitoring systems are reported, according to the guidelines by regulatory entities. Also, we propose a mapping from indicators to dimensions to determine the relation between these concepts. In order to establish the state of data quality in IoT-based AQ systems, we developed a systematic mapping study about this field. The results showed an increasing number of studies that take into account terms related to DQ within IoT-based air quality monitoring systems in the last few years; however, there is a lack of DQ terminology adoption and a rigorous application of DQ metrics. For instance, we had to identify the most relevant DQ dimensions related to IoT-based air quality system indirectly by analyzing the used enhancement techniques. To this end, we created a mapping between the enhancement techniques and the DQ dimensions.

In general, we found authors do not use the terminology of the DQ field. We suppose this is due to two different factors. First, there is an absence of regulations that take into account indicative measurements (like low-cost sensor measurements) in the evaluation of air quality. Second, authors ignore the existing guidelines because they are not required to follow them. The primary objectives of their research are to evaluate technological alternatives or data processing techniques.

It is understandable why low-cost sensor measurements are not fully considered by agencies in charge of environmental monitoring, because of their data is prone to have more errors than a robust station. However, to avoid such distrust on low-cost sensors, an air quality monitoring system can be implemented to be DQ-aware and also to include techniques to improve the quality of its data. In addition, many low-cost sensors can complement few robust stations to improve the resolution of the system, using the robust stations directly as references or sources of data to build reference models that help to improve DQ in low-cost air pollution sensors, for example, to be used in calibration processes.

Data Availability

The dataset generated and analyzed during the current study are available in the figshare repository, https://figshare.com/s/d72577f85291cb52356a.

References

Alavi-Shoshtari, M., Williams, D., Salmond, J., & et al. (2013). Detection of malfunctions in sensor networks. Environmetrics, 24(4), 227–236. https://doi.org/10.1002/env.2206.
Article Google Scholar
Alavi-Shoshtari, M., Salmond, J., Giurcăneanu, C., & et al. (2018). Automated data scanning for dense networks of low-cost air quality instruments: Detection and differentiation of instrumental error and local to regional scale environmental abnormalities. Environmental Modelling and Software, 101, 34–50. https://doi.org/10.1016/j.envsoft.2017.12.002.
Article Google Scholar
Alvarado, M., Gonzalez, F., Fletcher, A., & et al. (2015). Towards the development of a low cost airborne sensing system to monitor dust particles after blasting at open-pit mine sites. Sensors (Switzerland), 15(8), 19,667–19,687. https://doi.org/10.3390/s150819667.
Article Google Scholar
Atzori, L., Iera, A., & Morabito, G. (2017). Understanding the Internet of Things: Definition, potentials, and societal role of a fast evolving paradigm. Ad Hoc Networks, 56, 122–140.
Article Google Scholar
Barcelo-Ordinas, J., Garcia-Vidal, J., Doudou, M., & et al. (2018). Calibrating low-cost air quality sensors using multiple arrays of sensors. pp. 1–6, https://doi.org/10.1109/WCNC.2018.8377051.
Bart, M., Williams, D., Ainslie, B., & et al. (2014). High density ozone monitoring using gas sensitive semi-conductor sensors in the lower fraser valley, British Columbia. Environmental Science and Technology, 48(7), 3970–3977. https://doi.org/10.1021/es404610t.
Article CAS Google Scholar
Batini, C., Cappiello, C., Francalanci, C., & et al. (2009). Methodologies for data quality assessment and improvement. ACM Computing Surveys, 41(3). https://doi.org/10.1145/1541880.1541883.
Batini, C., & Scannapieca, M. (2006). Data quality: Concepts, methodologies and techniques. https://doi.org/10.1007/978-1-4020-4749-5_4.
Benabbas, A., Geißelbrecht, M., Martin Nikol, G., & et al. (2019). Measure particulate matter by yourself: Data-quality monitoring in a citizen science project. Journal of Sensors and Sensor Systems, 8(2), 317–328. https://doi.org/10.5194/jsss-8-317-2019.
Article Google Scholar
Budde, M., & Riedel, T. (2018). Challenges in capturing and analyzing high resolution urban air quality data. In Proceedings of the 2018 ACM international joint conference and 2018 international symposium on pervasive and ubiquitous computing and wearable computers, UbiComp ’18 (pp. 1162–1165). New York: Association for Computing Machinery. https://doi.org/10.1145/3267305.3274762.
Buelvas, P.J.H., Avila, B.F.E., Gaviria, G.N., & et al. (2021). Data quality estimation in a smart city’s air quality monitoring IoT application. In 2021 2nd sustainable cities latin America conference, SCLA 2021. https://doi.org/10.1109/SCLA53004.2021.9540154.
Carratu, M., Ferro, M., Paciello, V., & et al. (2020). Wireless sensor network calibration for PM10 measurement. https://doi.org/10.1109/CIVEMSA48639.2020.9132973.
Castell, N., Dauge, F., Schneider, P., & et al. (2017). Can commercial low-cost sensor platforms contribute to air quality monitoring and exposure estimates? Environment International, 99, 293–302. https://doi.org/10.1016/j.envint.2016.12.007.
Article CAS Google Scholar
Chen, L., Babar, M.A., & Zhang, H. (2010). Towards evidence-based understanding of electronic data sources. In EASE’10 proceedings of the 14th international conference on evaluation and assessment in software engineering (pp. 135–138). Swindon.
Chen, L.J., Ho, Y.H., Hsieh, H.H., & et al. (2018). ADF: An anomaly detection framework for large-scale PM2.5 sensing systems. IEEE Internet of Things Journal, 5(2), 559–570. https://doi.org/10.1109/JIOT.2017.2766085.
Article Google Scholar
Chu, H.J., Ali, M.Z., & He, Y.C. (2020). Spatial calibration and PM2.5 mapping of low-cost air quality sensors. Scientific Reports, 10(1). https://doi.org/10.1038/s41598-020-79064-w.
Chung, J.J., & Kim, H.J. (2020). An automobile environment detection system based on deep neural network and its implementation using IoT-enabled in-vehicle air quality sensors. Sustainability (Switzerland), 12(6). https://doi.org/10.3390/su12062475.
Connolly, R.E., Yu, Q., Wang, Z., & et al. (2022). Long-term evaluation of a low-cost air sensor network for monitoring indoor and outdoor air quality at the community scale. Science of The Total Environment, 807, 150,797. https://doi.org/10.1016/j.scitotenv.2021.150797.
Article CAS Google Scholar
Cui, H., Zhang, L., Li, W., & et al. (2021). A new calibration system for low-cost sensor network in air pollution monitoring. Atmospheric Pollution Research, 12(5), 101,049. https://doi.org/10.1016/j.apr.2021.03.012.
Article CAS Google Scholar
De Vito, S., Massera, E., Piga, M., & et al. (2008). On field calibration of an electronic nose for benzene estimation in an urban pollution monitoring scenario. Sensors and Actuators B: Chemical, 129(2), 750–757. https://doi.org/10.1016/j.snb.2007.09.060.
Article CAS Google Scholar
Duvall, R., Long, R., Beaver, M., & et al. (2016). Performance evaluation and community application of low-cost sensors for ozone and nitrogen dioxide. Sensors (Switzerland), 16(10). https://doi.org/10.3390/s16101698.
EPA, UEPA. (2017). Quality assurance handbook for air pollution measurement systems, vol. 2.
Fang, X., & Bate, I. (2017a). Issues of using wireless sensor network to monitor urban air quality. pp. 32–39. https://doi.org/10.1145/3143337.3143339.
Fang, X., & Bate, I. (2017b). Using multi-parameters for calibration of low-cost sensors in urban environment. In Proceedings of the 2017 international conference on embedded wireless systems and networks, EWSN ’17 (pp. 1–11). USA: Junction Publishing. https://doi.org/10.5555/3108009.3108011.
Feinberg, S., Williams, R., Hagler, G., & et al. (2019). Examining spatiotemporal variability of urban particulate matter and application of high-time resolution data from a network of low-cost air pollution sensors. Atmospheric Environment, 213, 579–584. https://doi.org/10.1016/j.atmosenv.2019.06.026.
Article CAS Google Scholar
Fu, K., Ren, W., & Dong, W. (2017). Multihop calibration for mobile sensing: K-hop calibratability and reference sensor deployment. https://doi.org/10.1109/INFOCOM.2017.8056962.
Guo, J., & Liu, F. (2015). Automatic data quality control of observations in wireless sensor network. IEEE Geoscience and Remote Sensing Letters, 12 (4), 716–720. https://doi.org/10.1109/LGRS.2014.2359685.
Article Google Scholar
Hao, F., Jiao, M., Min, G., & et al. (2015). Launching an efficient participatory sensing campaign: A smart mobile device-based approach. ACM Transactions on Multimedia Computing, Communications and Applications, 12. https://doi.org/10.1145/2808198.
Harkat, M.F., Mourot, G., & Ragot, J. (2006). An improved PCA scheme for sensor FDI: Application to an air quality monitoring network. Journal of Process Control, 16(6), 625–634. https://doi.org/10.1016/j.jprocont.2005.09.007.
Article CAS Google Scholar
Harrou, F., Dairi, A., Sun, Y., & et al. (2018). Reliable detection of abnormal ozone measurements using an air quality sensors network. pp. 1–5, https://doi.org/10.1109/EE1.2018.8385265.
Hasenfratz, D., Saukh, O., & Thiele, L. (2012). On-the-fly calibration of low-cost gas sensors. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 7158 LNCS:228–244. https://doi.org/10.1007/978-3-642-28169-3_15.
Heimann, I., Bright, V., McLeod, M., & et al. (2015). Source attribution of air pollution by spatial scale separation using high spatial density networks of low cost air quality sensors. Atmospheric Environment, 113, 10–19. https://doi.org/10.1016/j.atmosenv.2015.04.057.
Article CAS Google Scholar
Hofman, J., Nikolaou, M.E., Do, T.H., & et al. (2020). Mapping air quality in IoT cities: Cloud calibration and air quality inference of sensor data. In Proceedings of IEEE sensors. https://doi.org/10.1109/SENSORS47125.2020.9278941.
Hofman, J., Nikolaou, M., Shantharam, S.P., & et al. (2022). Distant calibration of low-cost PM and NO2 sensors; Evidence from multiple sensor testbeds. Atmospheric Pollution Research, 13(1), 101,246. https://doi.org/10.1016/j.apr.2021.101246.
Article CAS Google Scholar
Huang, J.W., Zhong, M.X., & Jaysawal, B. (2020a). Tadilof: Time aware density-based incremental local outlier detection in data streams. Sensors (Switzerland), 20(20), 1–25. https://doi.org/10.3390/s20205829.
Huang, J.W., Zhong, M.X., & Jaysawal, B.P. (2020b). Tadilof: Time aware density-based incremental local outlier detection in data streams. Sensors, 20(20), 5829.
ISO 25000 Portal. (2019). ISO/IEC 25012. https://iso25000.com/index.php/en/iso-25000-standards/iso-25012?start=0.
Jain, R., & Shah, H. (2017). An anomaly detection in smart cities modeled as wireless sensor network. https://doi.org/10.1109/ICONSIP.2016.7857445.
James, K.L., Randall, N.P., & Haddaway, N.R. (2016). A methodology for systematic mapping in environmental sciences. Environmental Evidence, 5(1), 7. https://doi.org/10.1186/s13750-016-0059-6.
Article Google Scholar
JCGM (2008). Evaluation of measurement data — Guide to the expression of uncertainty in measurement. International Organization for Standardization Geneva ISBN, 50, 134.
Google Scholar
Jiao, W., Hagler, G., Williams, R., & et al. (2016). Community air sensor network (CAIRSENSE) project: Evaluation of low-cost sensor performance in a suburban environment in the Southeastern United States. Atmospheric Measurement Techniques, 9(11), 5281–5292. https://doi.org/10.5194/amt-9-5281-2016.
Article CAS Google Scholar
Kaivonen, S., & Ngai, E.H. (2020). Real-time air pollution monitoring with sensors on city bus. Digital Communications and Networks, 6(1), 23–30. https://doi.org/10.1016/j.dcan.2019.03.003.
Article Google Scholar
Karkouch, A., Mousannif, H., Al Moatassime, H., & et al. (2016). Data quality in internet of things: A state-of-the-art survey. Journal of Network and Computer Applications, 73, 57–81. https://doi.org/10.1016/j.jnca.2016.08.002.
Article Google Scholar
Kelly, K., Whitaker, J., Petty, A., & et al. (2017). Ambient and laboratory evaluation of a low-cost particulate matter sensor. Environmental Pollution, 221, 491–500. https://doi.org/10.1016/j.envpol.2016.12.039.
Article CAS Google Scholar
Kendrick, C., Wilde, D., Martin, K., & et al. (2019). Developing best practices for air quality sensor deployments through testing. https://doi.org/10.1145/3357492.3358626.
Kim, J., Shusterman, A., Lieschke, K., & et al. (2018). The Berkeley atmospheric CO₂ observation network: Field calibration and evaluation of low-cost air quality sensors. Atmospheric Measurement Techniques, 11(4), 1937–1946. https://doi.org/10.5194/amt-11-1937-2018.
Article CAS Google Scholar
Kizel, F., Etzion, Y., Shafran-Nathan, R., & et al. (2018). Node-to-node field calibration of wireless distributed air pollution sensor network. Environmental Pollution, 233, 900–909. https://doi.org/10.1016/j.envpol.2017.09.042.
Article CAS Google Scholar
Klein, A., & Lehner, W. (2009). Representing data quality in sensor data streaming environments. Journal of Data and Information Quality, 1(2). https://doi.org/10.1145/1577840.1577845.
Kotsev, A., Schade, S., Craglia, M., & et al. (2016). Next generation air quality platform: Openness and interoperability for the Internet of Things. Sensors (Switzerland), 16(3). https://doi.org/10.3390/s16030403.
Kuemper, D., Iggena, T., Toenjes, R., & et al. (2018). Valid.IoT - A framework for sensor data quality analysis and interpolation. pp 294–303. https://doi.org/10.1145/3204949.3204972.
Kumar, P., Morawska, L., Martani, C., & et al. (2015). The rise of low-cost sensing for managing air pollution in cities. Environment International, 75, 199–205. https://doi.org/10.1016/j.envint.2014.11.019.
Article Google Scholar
Lee, C.H., Wang, Y.B., & Yu, H.L. (2019). An efficient spatiotemporal data calibration approach for the low-cost PM2.5 sensing network: A case study in Taiwan. Environment International, 130, 104,838. https://doi.org/10.1016/j.envint.2019.05.032.
Article CAS Google Scholar
Li, F., Nastic, S., & Dustdar, S. (2012). Data quality observation in pervasive environments. Proceedings - 15th IEEE international conference on computational science and engineering, CSE 2012 and 10th IEEE/IFIP international conference on embedded and ubiquitous computing, EUC 2012 (pp. 602–609). https://doi.org/10.1109/ICCSE.2012.88.
Li, J., Li, H., Ma, Y., & et al. (2018). Spatiotemporal distribution of indoor particulate matter concentration with a low-cost sensor network. Building and Environment, 127, 138–147. https://doi.org/10.1016/j.buildenv.2017.11.001.
Article CAS Google Scholar
Li, Y., Yuan, Z., Chen, L.W.A., & et al. (2022). From air quality sensors to sensor networks: Things we need to learn. Sensors and Actuators B: Chemical, 351, 130,958. https://doi.org/10.1016/j.snb.2021.130958.
Article CAS Google Scholar
Lin, T.H., Zhang, X.R., Chen, C.P., & et al. (2022). Learning to identify malfunctioning sensors in a large-scale sensor network. IEEE Sensors Journal, 22(3), 2582–2590. https://doi.org/10.1109/JSEN.2021.3138250.
Article CAS Google Scholar
Lin, Y., Dong, W., & Chen, Y. (2018). Calibrating low-cost sensors by a two-phase learning approach for urban air quality measurement. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 2(1). https://doi.org/10.1145/3191750.
Lin, Y.C., Chi, W.J., & Lin, Y.Q. (2020a). The improvement of spatial-temporal resolution of PM2.5 estimation based on micro-air quality sensors by using data fusion technique. Environment International, 134. https://doi.org/10.1016/j.envint.2019.105305.
Lin, Y.C., Chi, W.J., & Lin, Y.Q. (2020b). The improvement of spatial-temporal resolution of PM2.5 estimation based on micro-air quality sensors by using data fusion technique. Environment International, 134, 105,305. https://doi.org/10.1016/j.envint.2019.105305.
Liu, C., Nitschke, P., Williams, S.P., & et al. (2019). Data quality and the Internet of Things. Computing. https://doi.org/10.1007/s00607-019-00746-z.
Liu, D., Zhang, Q., Jiang, J., & et al. (2017). Performance calibration of low-cost and portable particular matter (PM) sensors. Journal of Aerosol Science, 112, 1–10. https://doi.org/10.1016/j.jaerosci.2017.05.011.
Article CAS Google Scholar
Ma, Q., Gu, Y., Lee, W.C., & et al. (2020). REMIAN: Real-time and error-tolerant missing value imputation. ACM Transactions on Knowledge Discovery from Data, 14(6). https://doi.org/10.1145/3412364.
Maag, B., Zhou, Z., Saukh, O., & et al. (2017). SCAN: Multi-hop calibration for mobile sensor arrays. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 1(2). https://doi.org/10.1145/3090084.
Manikonda, A., Zíková, N., Hopke, P.K., & et al. (2016). Laboratory assessment of low-cost PM monitors. Journal of Aerosol Science, 102, 29–40. https://doi.org/10.1016/j.jaerosci.2016.08.010.
Article CAS Google Scholar
Marathe, S., Nambi, A., Swaminathan, M., & et al. (2021). CurrentSense: A novel approach for fault and drift detection in environmental IoT sensors. In IoTDI 2021 - Proceedings of the 2021 international conference on Internet-of-Things design and implementation (pp. 93–105). https://doi.org/10.1145/3450268.3453535.
Markert, J.F., Budde, M., Schindler, G., & et al. (2016). Private rendezvous-based calibration of low-cost sensors for participatory environmental sensing. pp. 82–85, https://doi.org/10.1145/2962735.2962754.
Mead, M., Popoola, O., Stewart, G., & et al. (2013). The use of electrochemical sensors for monitoring urban air quality in low-cost, high-density networks. Atmospheric Environment, 70, 186–203. https://doi.org/10.1016/j.atmosenv.2012.11.060.
Article CAS Google Scholar
Moltchanov, S., Levy, I., Etzion, Y., & et al. (2015). On the feasibility of measuring urban air pollution by wireless distributed sensor networks. Science of the Total Environment, 502, 537–547. https://doi.org/10.1016/j.scitotenv.2014.09.059.
Article CAS Google Scholar
Mueller, M., Meyer, J., & Hueglin, C. (2017). Design of an ozone and nitrogen dioxide sensor unit and its long-term operation within a sensor network in the city of Zurich. Atmospheric Measurement Techniques, 10(10), 3783–3799. https://doi.org/10.5194/amt-10-3783-2017.
Article CAS Google Scholar
Múnera, D., Tobon, D.P., Aguirre, J., & et al. (2021). IoT-based air quality monitoring systems for smart cities : A systematic mapping study. International Journal of Electrical and Computer Engineering (IJECE), 11(4), 3470–3482. https://doi.org/10.11591/ijece.v11i4.pp3470-3482.
Article Google Scholar
Nguyen, T., Ha, D., Do, T., & et al. (2019). Air pollution monitoring network using low-cost sensors, a case study in Hanoi, Vietnam. https://doi.org/10.1088/1755-1315/266/1/012017.
Okafor, N., Alghorani, Y., & Delaney, D. (2020). Improving data quality of low-cost IoT sensors in environmental monitoring networks using data fusion and machine learning approach. ICT Express, 6(3), 220–228. https://doi.org/10.1016/j.icte.2020.06.004.
Article Google Scholar
Orlowski, C., Cofta, P., Wasik, M., & et al. (2019). The use of group decision-making to improve the monitoring of air quality. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 11890 LNCS:127–145. DOI:https://doi.org/10.1007/978-3-662-60555-4_9.
Ouzzani, M., Hammady, H., Fedorowicz, Z., & et al. (2016). Rayyan-a web and mobile app for systematic reviews. Systematic Reviews, 5(1), 1–10. https://doi.org/10.1186/s13643-016-0384-4.
Article Google Scholar
Penza, M. (2020). Chapter 12 - Low-cost sensors for outdoor air quality monitoring. In E. Llobet (Ed.) Advanced nanomaterials for inexpensive gas microsensors. Micro and Nano Technologies (pp. 235–288). Elsevier. https://doi.org/10.1016/B978-0-12-814827-3.00012-8.
Petersen, K., Feldt, R., Mujtaba, S., & et al. (2008). Systematic mapping studies in software engineering. https://doi.org/10.14236/ewic/ease2008.8.
Petersen, K., Vakkalanka, S., & Kuzniarz, L. (2015). Guidelines for conducting systematic mapping studies in software engineering: An update. Information and Software Technology, 64, 1–18. https://doi.org/10.1016/j.infsof.2015.03.007.
Article Google Scholar
Piedrahita, R., Xiang, Y., Masson, N., & et al. (2014). The next generation of low-cost personal air quality sensors for quantitative exposure monitoring. Atmospheric Measurement Techniques, 7(10), 3325–3336. https://doi.org/10.5194/amt-7-3325-2014.
Article Google Scholar
Proantioquia, Universidad Eafit, Fundación Corona, et al. (2020). Medellín cómo vamos. https://www.medellincomovamos.org/node/18687.
Qiao, X., Zhang, Q., Wang, D., & et al. (2021). Improving data reliability: A quality control practice for low-cost PM2.5 sensor network. Science of The Total Environment, 779, 146,381. https://doi.org/10.1016/j.scitotenv.2021.146381.
Article CAS Google Scholar
Qin, X., Hou, L., Gao, J., & et al. (2020). The evaluation and optimization of calibration methods for low-cost particulate matter sensors: Inter-comparison between fixed and mobile methods. Science of the Total Environment, 715. https://doi.org/10.1016/j.scitotenv.2020.136791.
Rai, A., Kumar, P., Pilla, F., & et al. (2017). End-user perspective of low-cost sensors for outdoor air pollution monitoring. Science of the Total Environment, 607-608, 691–705. https://doi.org/10.1016/j.scitotenv.2017.06.266.
Article CAS Google Scholar
Rajasegarar, S., Havens, T., Karunasekera, S., & et al. (2014a). High-resolution monitoring of atmospheric pollutants using a system of low-cost sensors. IEEE Transactions on Geoscience and Remote Sensing, 52(7), 3823–3832. https://doi.org/10.1109/TGRS.2013.2276431.
Rajasegarar, S., Zhang, P., Zhou, Y., & et al. (2014b). High resolution spatio-temporal monitoring of air pollutants using wireless sensor networks. https://doi.org/10.1109/ISSNIP.2014.6827607.
Rezapour, A., & Tzeng, W.G. (2021). RL-PMAgg: Robust aggregation for PM2.5 using deep RL-based trust management system. Internet of Things, 13, 100,347. https://doi.org/10.1016/j.iot.2020.100347.
Article Google Scholar
Rivera-Munoz, L.M., Gallego-Villada, J.D., Giraldo-Forero, A.F., & et al. (2021). Missing data estimation in a low-cost sensor network for measuring air quality: A case study in Aburra Valley. Water Air and Soil Pollution, 232(10). https://doi.org/10.1007/s11270-021-05363-1.
Rollo, F., & Po, L. (2021). SenseBoard: Sensor monitoring for air quality experts. In CEUR workshop proceedings.
Röösli, M., Braun-Fahrländer, C., Künzli, N., & et al. (2000). Spatial variability of different fractions of particulate matter within an urban environment and between urban and rural sites. Journal of the Air and Waste Management Association, 50 (7), 1115–1124. https://doi.org/10.1080/10473289.2000.10464161.
Article Google Scholar
Saukh, O., Hasenfratz, D., & Thiele, L. (2015). Reducing multi-hop calibration errors in large-scale mobile sensor networks. pp 274–285. https://doi.org/10.1145/2737095.2737113.
Sicari, S., Cappiello, C., De Pellegrini, F., & et al. (2016). A security-and quality-aware system architecture for Internet of Things. Information Systems Frontiers, 18(4), 665–677. https://doi.org/10.1007/s10796-014-9538-x.
Article Google Scholar
Sicari, S., Rizzardi, A., Cappiello, C., & et al. (2018). Toward data governance in the Internet of Things. Studies in Computational Intelligence, 715, 59–74. https://doi.org/10.1007/978-3-319-58190-3_4.
Article Google Scholar
Sun, L., Wong, K., Wei, P., & et al. (2016). Development and application of a next generation air sensor network for the Hong Kong marathon 2015 air quality monitoring. Sensors (Switzerland), 16(2). https://doi.org/10.3390/s16020211.
Talampas, M., & Low, K.S. (2012). Maximum likelihood estimation of ground truth for air quality monitoring using vehicular sensor networks. https://doi.org/10.1109/TENCON.2012.6412308.
Tang, M. (2016). Geospatial multimedia data for situation recognition. pp. 1430–1434, https://doi.org/10.1145/2964284.2971472.
UNION, E., et al. (2008). Directive 2008/50/EC of the European parliament and of the council of 21 May 2008 on ambient air quality and cleaner air for Europe. Official Journal of the European Union.
Van Zoest, V., Liu, X., & Ngai, E. (2021). Data quality evaluation, outlier detection and missing data imputation methods for IoT in smart cities. https://doi.org/10.1007/978-3-030-72065-0=_1.
Wang, L., Zhang, D., Pathak, A., & et al. (2015). CCS-TA: Quality-guaranteed online task allocation in compressive crowdsensing. pp. 683–694, https://doi.org/10.1145/2750858.2807513.
Wang, L., Zhang, D., Yang, D., & et al. (2017). SPACE-TA: Cost-effective task allocation exploiting intradata and interdata correlations in sparse crowdsensing. ACM Transactions on Intelligent Systems and Technology, 9(2). https://doi.org/10.1145/3131671.
Wang, R. (1996). Beyond accuracy: What data quality means to data consumers. Journal of Management Information Systems, 12(4), 5–34. https://doi.org/10.1080/07421222.1996.11518099.
Article Google Scholar
Wang, W.C., Lung, S.C., Liu, C., & et al. (2020). Laboratory evaluations of correction equations with multiple choices for seed low-cost particle sensing devices in sensor networks. Sensors (Switzerland), 20(13), 1–25. https://doi.org/10.3390/s20133661.
Article CAS Google Scholar
Weissert, L., Salmond, J., Miskell, G., & et al. (2017). Use of a dense monitoring network of low-cost instruments to observe local changes in the diurnal ozone cycles as marine air passes over a geographically isolated urban centre. Science of the Total Environment, 575, 67–78. https://doi.org/10.1016/j.scitotenv.2016.09.229.
Article CAS Google Scholar
White, F.E. (1991). Data fusion Lexicon. Tech. rep., Joint Directors of Labs Washington DC.
World Health Organization Regional Office for Europe. (2017). Evolution of WHO air quality guidelines: Past present and future. WHO, Copenhagen.
Xie, K., Xie, G., Li, X., & et al. (2019). Active sparse mobile crowd sensing based on matrix completion. pp. 195–210. https://doi.org/10.1145/3299869.3319856.
Yuan, Q., Liu, Z., Li, J., & et al. (2016). An adaptive and compressive data gathering scheme in vehicular sensor networks. pp 207–215. https://doi.org/10.1109/ICPADS.2015.34.
van Zoest, V., Osei, F., Stein, A., & et al. (2019). Calibration of low-cost NO2 sensors in an urban air quality network. Atmospheric Environment, 210, 66–75. https://doi.org/10.1016/j.atmosenv.2019.04.048.
Article CAS Google Scholar

Download references

Funding

Open Access funding provided by Colombia Consortium. This research was supported by the project 1135 funded by the Science and Technology Vice-Rector’s office at the Universidad de Medellín in Medellín, Colombia.

Author information

Authors and Affiliations

Faculty of Engineering, Universidad de Antioquia, Cl. 67 #53-108, Medellín, Colombia
Julio Buelvas, Danny Múnera & Natalia Gaviria
Faculty of Engineering, Universidad de Medellín, Cra. 87 #30-65, Medellín, Colombia
Diana P. Tobón V. & Johnny Aguirre

Authors

Julio Buelvas
View author publications
You can also search for this author in PubMed Google Scholar
Danny Múnera
View author publications
You can also search for this author in PubMed Google Scholar
Diana P. Tobón V.
View author publications
You can also search for this author in PubMed Google Scholar
Johnny Aguirre
View author publications
You can also search for this author in PubMed Google Scholar
Natalia Gaviria
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Danny Múnera.

Ethics declarations

Conflict of Interest

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Julio Buelvas, Danny Múnera, Diana P. Tobón V., Johnny Aguirre, and Natalia Gaviria contributed equally to this work.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Buelvas, J., Múnera, D., Tobón V., D.P. et al. Data Quality in IoT-Based Air Quality Monitoring Systems: a Systematic Mapping Study. Water Air Soil Pollut 234, 248 (2023). https://doi.org/10.1007/s11270-023-06127-9

Download citation

Received: 05 August 2022
Accepted: 23 January 2023
Published: 03 April 2023
DOI: https://doi.org/10.1007/s11270-023-06127-9

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Data Quality in IoT-Based Air Quality Monitoring Systems: a Systematic Mapping Study

Abstract

Similar content being viewed by others

CEN/TS 17660 in Air Quality Systems for Data Quality Validation and Certification over Smart Spot Air Quality Systems

Air Quality Monitoring System and Benchmarking

Indoor Air-Quality Monitoring Systems: A Comprehensive Review of Different IAQM Systems

1 Introduction

2 Data Quality Principles

3 Data Quality in the Context of Air Quality Estimation

4 DQ Enhancement Techniques

4.1 Data Calibration

4.2 Data Interpolation

4.3 Data Agregation/Fusion

4.4 Outlier Detection

4.5 Relation Between DQ Enhancement Techniques and Dimensions

5 Systematic Mapping Method

5.1 Research Questions

5.2 Search Strategy

5.3 Selection Criteria

5.4 Data Extraction

6 Results

6.1 DQ Dimensions and Enhancement Techniques

6.2 Endangering Factors

6.3 Data Quality Estimation and Degradation

6.4 Citations

7 Discussion

8 Conclusion and Future Work

Data Availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation