Abstract
Urban air pollution poses a significant global health risk, but due to the high expense of measuring air quality, the amount of available data on pollutant exposure has generally been wanting. In recent years this has motivated the development of several cheap, portable air quality monitoring instruments. However, these instruments also tend to be unreliable, and thus the raw measurements require preprocessing to make accurate predictions of actual air quality conditions, making them an apt target for machine learning techniques. In this paper we use a dataset of measurements from a low cost air-quality instrument—the ODIN-SD—to examine which techniques are most appropriate, and the limitations of such an approach. From theoretical and experimental considerations, we conclude that a robust linear regression over measurements of air quality metrics, as well as relative humidity and temperature measurements produces the model with greatest accuracy. We also discuss issues of concept drift which occur in this context, and quantify how much training data is required to strike the right balance between predictive accuracy and efficient data collection.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bifet, A., Gavalda, R.: Learning from time-changing data with adaptive windowing. In: Proceedings of the 2007 SIAM International Conference on Data Mining, pp. 443–448. SIAM (2007)
Air Quality Sensor Performance Evaluation Center. Purpleair PA-ii - summary report. http://www.aqmd.gov/docs/default-source/aq-spec/summary/purpleair-pa-ii---summary-report.pdf?sfvrsn=4. Accessed 20 Feb 2018
Cohen, A.J., et al.: The global burden of disease due to outdoor air pollution. J. Toxicol. Environ. Health Part A 68(13–14), 1301–1307 (2005)
Delany, S.J., Cunningham, P., Tsymbal, A., Coyle, L.: A case-based technique for tracking concept drift in spam filtering. Knowl. Based Syst. 18(4–5), 187–195 (2005)
Gama, J., Medas, P., Castillo, G., Rodrigues, P.: Learning with drift detection. In: Bazzan, A.L.C., Labidi, S. (eds.) SBIA 2004. LNCS (LNAI), vol. 3171, pp. 286–295. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-28645-5_29
Zico Kolter, J., Maloof, M.A.: Dynamic weighted majority: an ensemble method for drifting concepts. J. Mach. Learn. Res. 8, 2755–2790 (2007)
Koychev, I.: Gradual forgetting for adaptation to concept drift. In: Proceedings of ECAI 2000 Workshop on Current Issues in Spatio-Temporal Reasoning (2000)
Lu, X., Wang, Y., Huang, L., Yang, W., Shen, Y.: Temporal-spatial aggregated urban air quality inference with heterogeneous big data. In: Yang, Q., Yu, W., Challal, Y. (eds.) WASA 2016. LNCS, vol. 9798, pp. 414–426. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-42836-9_37
McKone, T.E., Barry Ryan, P., Ă–zkaynak, H.: Exposure information in environmental health research: current opportunities and future directions for particulate matter, ozone, and toxic air pollutants. J. Expo. Sci. Environ. Epidemiol. 19(1), 30 (2009)
Olivares, G., Edwards, S.: The outdoor dust information node (ODIN)-development and performance assessment of a low cost ambient dust sensor. Atmos. Meas. Tech. Discuss. 8, 7511–7533 (2015)
Shalizi, C.: Advanced Data Analysis from an Elementary Point of View. Cambridge University Press, Cambridge (2013)
Snyder, E.G., et al.: The changing paradigm of air pollution monitoring (2013)
Su, B., Shen, Y.-D., Xu, W.: Modeling concept drift from the perspective of classifiers. In: 2008 IEEE Conference on Cybernetics and Intelligent Systems, pp. 1055–1060. IEEE (2008)
Widmer, G., Kubat, M.: Effective learning in dynamic environments by explicit context tracking. In: Brazdil, P.B. (ed.) ECML 1993. LNCS, vol. 667, pp. 227–243. Springer, Heidelberg (1993). https://doi.org/10.1007/3-540-56602-3_139
Zheng, Y., et al.: Forecasting fine-grained air quality based on big data. In: Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 2267–2276. ACM (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Huggard, H., Koh, Y.S., Riddle, P., Olivares, G. (2019). Predicting Air Quality from Low-Cost Sensor Measurements. In: Islam, R., et al. Data Mining. AusDM 2018. Communications in Computer and Information Science, vol 996. Springer, Singapore. https://doi.org/10.1007/978-981-13-6661-1_8
Download citation
DOI: https://doi.org/10.1007/978-981-13-6661-1_8
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-6660-4
Online ISBN: 978-981-13-6661-1
eBook Packages: Computer ScienceComputer Science (R0)