Skip to main content

Missing Data Estimation in a Low-Cost Sensor Network for Measuring Air Quality: a Case Study in Aburrá Valley

Abstract

According to the World Health Organization (WHO), air pollution is currently one leading cause of death around the world. As a result, some projects have emerged to monitor air quality through the implementation of low-cost Wireless Sensor Networks (WSNs). However, the type of technology and the sensors’ location have an impact on data quality, resulting in a considerable amount of missing data. This hinders the proper implementation of methodologies for sensor calibration and data leverage for dispersion analysis of pollutants and prediction of pollution episodes. This paper presents a methodology based on matrix factorization (MF) to recover missing data from a low-cost WSN for particulate matter PM2.5 measurement. Using the proposed methodology with the study case in Aburrá Valley, Colombia, it is shown that is possible to recover 40% missing data with less than 12% errors, obtaining better results than those presented by other methods found in the literature.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Notes

  1. 1.

    https://www.londonair.org.uk/LondonAir/

  2. 2.

    https://waqi.info/

References

  1. Barcelo-Ordinas, J. M., Doudou, M., Garcia-Vidal, J., & Badache, N. (2019). Self-calibration methods for uncontrolled environments in sensor networks: A reference survey Ad Hoc Networks. https://doi.org/10.1016/j.adhoc.2019.01.008.

  2. Bottou, L. (2012). Stochastic gradient descent tricks. In Neural networks: Tricks of the trade (pp. 421–436). Springer.

  3. Cárdenas, A. M., Rivera, L. M., Gómez, B. L., Valencia, G. M., Acosta, H. A., & Correa, J. D. (2018). Short communication: Pollution-and-greenhouse gases measurement system. Measurement, 129, 565–568. https://doi.org/10.1016/j.measurement.2018.07.039.

    Article  Google Scholar 

  4. Castell, N., Dauge, F. R., Schneider, P., Vogt, M., Lerner, U., Fishbain, B., Broday, D., & Bartonova, A. (2017). Can commercial low-cost sensor platforms contribute to air quality monitoring and exposure estimates?. Environment International, 99, 293–302. https://doi.org/10.1016/j.envint.2016.12.007.

    CAS  Article  Google Scholar 

  5. de Wolff, T., Cuevas, A., & Tobar, F. (2020). MOGPTK: The Multi-Output Gaussian Process Toolkit. arXiv:2002.03471. https://github.com/GAMES-UChile/mogptk.

  6. Demšar, J. (2006). Statistical comparisons of classifiers over multiple data sets. The Journal of Machine Learning Research, 7, 1–30.

    Google Scholar 

  7. Dorffer, C., Puigt, M., Delmaire, G., & Roussel, G. (2016). Nonlinear mobile sensor calibration using informed semi-nonnegative matrix factorization with a Vandermonde factor Proceedings of the IEEE Sensor Array and Multichannel Signal Processing Workshop 2016-Septe. https://doi.org/10.1109/SAM.2016.7569735.

  8. Hagler, G.S.W., Williams, R., Papapostolou, V., & Polidori, A. (2018). Air quality sensors and data adjustment algorithms: When is it no longer a measurement? Environmental Science & Technology, 52(10), 5530–5531. https://doi.org/10.1021/acs.est.8b01826. PMID: 29688714.

    CAS  Article  Google Scholar 

  9. Hautecoeur, C., & Glineur, F. (2020). Nonnegative matrix factorization over continuous signals using parametrizable functions. Neurocomputing.

  10. He, Y., & chang Pi, D. (2016). Improving knn method based on reduced relational grade for microarray missing values imputation. IAENG International Journal of Computer Science, 43 (3), 356–362.

    Google Scholar 

  11. Huang, Q., Yin, X., Chen, S., Wang, Y., & Chen, B. (2020). Robust nonnegative matrix factorization with structure regularization. Neurocomputing. https://doi.org/10.1016/j.neucom.2020.06.049.

  12. Iman, R. L., & Davenport, J. M. (1980). Approximations of the critical region of the fbietkan statistic. Communications in Statistics-Theory and Methods, 9(6), 571–595.

    Article  Google Scholar 

  13. Jakomin, M., Bosnić, Z., & Curk, T. (2020). Simultaneous incremental matrix factorization for streaming recommender systems. Expert Systems with Applications, p 113685. https://doi.org/10.1016/j.eswa.2020.113685.

  14. Koki Miura, M. T., & Okada, Y. (2016). A recommender system based on an improved simultaneous selection method of query items and neighbors. IAENG International Journal of Computer Science, 43(4), 406–410.

    Google Scholar 

  15. Koren, Y., Bell, R., & Volinsky, C. (2009). Matrix recommender techniques for factorization systems, pp 42–49. https://doi.org/10.1109/MC.2009.263.

  16. Lee, B.t., Son, S.c., & Kang, K. (2014). A blind calibration scheme exploiting mutual calibration relationships for a dense mobile sensor network. IEEE Sensors Journal, 14(5), 1518–1526. https://doi.org/10.1109/JSEN.2013.2297714.

    Article  Google Scholar 

  17. Liu, H., Cai, J., & Ong, Y. S. (2018). Remarks on multi-output gaussian process regression. Knowledge-Based Systems, 144, 102–121.

    Article  Google Scholar 

  18. Liu, X., Xi, T., & Ngai, E. (2016). Data modelling with gaussian process in sensor networks for urban environmental monitoring. In 2016 IEEE 24th international symposium on modeling, analysis and simulation of computer and telecommunication systems (MASCOTS) (pp. 457–462).

  19. Lung, S. C., Jones, R., Zellweger, C., Karppinen, A., Penza, M., Dye, T., Hu̇glin, C., Ning, Z., Lewis, A. C., von Schneidemesser, E., Peltier, R. E., Leigh, R., Hagan, D., Laurent, O., & Carmichael, G. (2018). Low-cost sensors for the measurement of atmospheric composition: overview of topic and future applications. May. https://doi.org/10.1016/j.biopsych.2014.07.012.

  20. Álvarez, M., Luengo, D., Titsias, M., & Lawrence, N. D. (2010). Efficient multioutput gaussian processes through variational inducing kernels. In Y. W. Teh M. Titterington (Eds.) Proceedings of the thirteenth international conference on artificial intelligence and statistics, proceedings of machine learning research. PMLR, Chia Laguna Resort, Sardinia, Italy, (Vol. 9 pp. 25–32).

  21. Maag, B., Zhou, Z., & Thiele, L. (2018). A survey on sensor calibration in air pollution monitoring deployments. IEEE Internet of Things Journal, 5(6), 4857–4870. https://doi.org/10.1109/JIOT.2018.2853660.

    Article  Google Scholar 

  22. Mazaheri, M., Clifford, S., Yeganeh, B., Viana, M., Rizza, V., Flament, R., Buonanno, G., & Morawska, L. (2018). Investigations into factors affecting personal exposure to particles in urban microenvironments using low-cost sensors. Environment International, 120(January), 496–504. https://doi.org/10.1016/j.envint.2018.08.033.

    CAS  Article  Google Scholar 

  23. minambiente. (2008). K2 Ingeniería: Protocolo para el monitoreo y seguimiento de la calidad del aire. Tech. rep., Bogotá D.C. www.minambiente.gov.co.

  24. Mnih, A., & Salakhutdinov, R. R. (2008). Probabilistic matrix factorization. In Advances in neural information processing systems (pp. 1257–1264).

  25. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.

    Google Scholar 

  26. Rasmussen, C. E. (2003). LNAI 3176 - Gaussian Processes In Machine Learning. Tech rep.

  27. Ross, B. (2008). Non-Negative Matrix factorization techniques and optimizations springer. https://doi.org/10.1007/978-3-662-48331-2.

  28. Schneider, P., Castell, N., Vogt, M., Dauge, F. R., Lahoz, W. A., & Bartonova, A. (2017). Mapping urban air quality in near real-time using observations from low-cost sensors and model information. Environment International, 106(December 2016), 234–247. https://doi.org/10.1016/j.envint.2017.05.005.

    CAS  Article  Google Scholar 

  29. Shah, J., & Mishra, B. (2020). Iot-enabled low power environment monitoring system for prediction of pm2. 5. Pervasive and Mobile Computing, 67(101), 175.

    Google Scholar 

  30. SIATA. (2021). Geoportal. https://siata.gov.co.

  31. Takács, G., Pilaszy, I., Nemeth, B., & Tikk, D. (2008). Matrix factorization and neighbor based algorithms for the netflix prize problem categories and subject descriptors. In Proceedings of the 2008 ACM conference on Recommender systems - RecSys ’08 (pp. 267–274).

  32. Vincent, E., Yeredor, A., Koldovský, Z., & Tichavský, P. (2015). Blind calibration of mobile sensors using informed nonnegative matrix factorization. Lecture notes in computer science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 9237(March 2018). https://doi.org/10.1007/978-3-319-22482-4.

  33. Wang, J., & Zhang, J. (2007). Addressing accuracy issues in privacy preserving data mining through matrix factorization. ISI 2007: 2007 IEEE Intelligence and Security Informatics, pp. 217–220. https://doi.org/10.1109/isi.2007.379474.

  34. Wang, W., De, S., Zhou, Y., Huang, X., & Moessner, K. (2017). Distributed sensor data computing in smart city applications. In 2017 IEEE 18th international symposium on a world of wireless, mobile and multimedia networks (woWMom) (pp. 1–5). IEEE.

  35. Wang, Y. X., & Zhang, Y. J. (2013). Nonnegative matrix factorization: a comprehensive review. IEEE Transactions on Knowledge and Data Engineering, 25(6), 1336–1353. https://doi.org/10.1109/TKDE.2012.51.

    Article  Google Scholar 

  36. WHO. (2018). Ambient (outdoor) air pollution. https://www.who.int/news-room/fact-sheets/detail/ambient-(outdoor)-air-quality-and-health.

  37. WHO. (2018). The top 10 causes of death. https://www.who.int/news-room/fact-sheets/detail/the-top-10-causes-of-death.

  38. Williams, C. (1999). Prediction with Gaussian processes. Learning in graphical models, pp 599–621.

  39. Xie, K., Ning, X., Wang, X., Xie, D., Cao, J., Xie, G., & Wen, J. (2017). Recover corrupted data in sensor networks: a matrix completion solution. IEEE Transactions on Mobile Computing, 16(5), 1434–1448. https://doi.org/10.1109/TMC.2016.2595569.

    Article  Google Scholar 

  40. Yi, W. Y., Lo, K. M., Mak, T., Leung, K. S., Leung, Y., & Meng, M. L. (2015). A survey of wireless sensor network based air pollution monitoring systems. Sensors, 15(12), 31,392–31,427.

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to acknowledge the cooperation of Sistema de Alerta Temprana del Valle de Aburrá (SIATA) in providing the data and to the installed capacity project PCI21108 belonging to the research group MIRP - Instituto Tecnológico Metropolitano (ITM).

Author information

Affiliations

Authors

Corresponding author

Correspondence to León M. Rivera-Muñoz.

Ethics declarations

Conflict of Interest

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Rivera-Muñoz, L.M., Gallego-Villada, J.D., Giraldo-Forero, A.F. et al. Missing Data Estimation in a Low-Cost Sensor Network for Measuring Air Quality: a Case Study in Aburrá Valley. Water Air Soil Pollut 232, 436 (2021). https://doi.org/10.1007/s11270-021-05363-1

Download citation

Keywords

  • Matrix factorization
  • Machine learning
  • Low-cost sensors network
  • Missing data estimation