Advertisement

A Novel Approach to Detect Missing Values Patterns in Time Series Data

  • Juan-Fernando LimaEmail author
  • Patricia Ortega-Chasi
  • Marcos Orellana Cordero
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 1099)

Abstract

The increase of environmental sensors to capture the behavior of cities implies large amounts of shared data. However, missing values issues are unavoidable, becoming it a critical problem for studies which require data analysis over extensive periods. The main problem is evident in longitudinal studies since they require data over long periods. Hence, a convenient process is to support the data collection rules by determining the behavior of common missing data slots. This process is possible by discovering missing data patterns over time series based on: (1) Data matrices definition, (2) Compute and categorize the missed periods using the proposed algorithm, (3) Identify the time analysis scenarios, and (4) Applying the Kernel Density Estimation algorithm. This paper describes the experimentation of this method using a real air quality dataset from Cuenca, Ecuador, collected over one-year. The results show that the proposed approach is useful to evidence the missing data patterns. Also, this approach provides a good starting point for companies and laboratories interested in improving their data collection rules.

Keywords

Missing values patterns Compute missing values Kernel Density Estimation 

Notes

Acknowledgments

This work is part of the “Aplicación de minería de datos en el análisis de asociaciones entre contaminantes atmosféricos y variables meteorológicas” project, supported by the University of Azuay, also thanks to Chester Sellers and EMOV-EP to provide access to data of air quality variables of Cuenca, Ecuador.

References

  1. 1.
    Albayrak, M., Turhan, K., Kurt, B.: A missing data imputation approach using clustering and maximum likelihood estimation. In: Medical Technologies National Congress (TIPTEKNO), pp. 1–4, October 2017.  https://doi.org/10.1109/TIPTEKNO.2017.8238064
  2. 2.
    Aljuaid, T., Sasi, S.: Proper imputation techniques for missing values in data sets. In: International Conference on Data Science and Engineering (ICDSE), pp. 1–5, August 2016.  https://doi.org/10.1109/ICDSE.2016.7823957
  3. 3.
    Barnett, A.G., McElwee, P., Nathan, A., Burton, N.W., Turrell, G.: Identifying patterns of item missing survey data using latent groups: an observational study. BMJ Open 7(10), e017284 (2017).  https://doi.org/10.1136/bmjopen-2017-017284. https://bmjopen.bmj.com/content/7/10/e017284CrossRefGoogle Scholar
  4. 4.
    Bennett, D.A.: How can i deal with missing data in my study? Aust. N. Z. J. Public Health 25(5), 464–469 (2001)CrossRefGoogle Scholar
  5. 5.
    Boudries, A., Aliouat, M., Siarry, P.: Detection and replacement of a failing node in the wireless sensors networks. Comput. Electr. Eng. 40(2), 421–432 (2014)CrossRefGoogle Scholar
  6. 6.
    Caruana, E.J., Roman, M., Hernndez-Snchez, J., Solli, P.: Longitudinal studies. J. Thorac. Dis. 7(11) (2015). http://jtd.amegroups.com/article/view/5822
  7. 7.
    Chapman, P., Clinton, J., Kerber, R., Khabaza, T., Reinartz, T., Shearer, C., Wirth, R.: CRISP-DM 1.0 step-by-step data mining guide. Technical report, The CRISP-DM consortium, August 2000. http://www.crisp-dm.org/CRISPWP-0800.pdf
  8. 8.
    Chen, W., Guo, F., Wang, F.: A survey of traffic data visualization. IEEE Trans. Intell. Transp. Syst. 16(6), 2970–2984 (2015).  https://doi.org/10.1109/TITS.2015.2436897CrossRefGoogle Scholar
  9. 9.
    Dockery, D.W., Brunekreef, B.A.: Longitudinal studies of air pollution effects on lung function. Am. J. Respir. Crit. Care Med. 154(6 Pt 2), S250–6 (1996)CrossRefGoogle Scholar
  10. 10.
    Dong, Y., Peng, C.Y.J.: Principled missing data methods for researchers. SpringerPlus 2(1), 222 (2013).  https://doi.org/10.1186/2193-1801-2-222CrossRefGoogle Scholar
  11. 11.
    Enders, C.K.: Applied Missing Data Analysis. Guilford Press, New York (2010)Google Scholar
  12. 12.
    Fischer, P.H., Marra, M., Ameling, C.B., Hoek, G., Beelen, R., de Hoogh, K., Breugelmans, O., Kruize, H., Janssen, N.A., Houthuijs, D.: Air pollution and mortality in seven million adults: the Dutch environmental longitudinal study (DUELS). Environ. Health Perspect. 123(7), 697–704 (2015)CrossRefGoogle Scholar
  13. 13.
    Fukunaga, K., Hostetler, L.: The estimation of the gradient of a density function, with applications in pattern recognition. IEEE Trans. Inf. Theor. 21(1), 32–40 (1975).  https://doi.org/10.1109/TIT.1975.1055330MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    Galimard, J.E., Chevret, S., Curis, E., Resche-Rigon, M.: Heckman imputation models for binary or continuous MNAR outcomes and MAR predictors. BMC Med. Res. Method. 18(1), 90 (2018).  https://doi.org/10.1186/s12874-018-0547-1CrossRefGoogle Scholar
  15. 15.
    Kong, L., Xia, M., Liu, X., Chen, G., Gu, Y., Wu, M., Liu, X.: Data loss and reconstruction in wireless sensor networks. IEEE Trans. Parallel Distrib. Syst. 25(11), 2818–2828 (2014).  https://doi.org/10.1109/TPDS.2013.269CrossRefGoogle Scholar
  16. 16.
    Schafer, J.L., Graham, J.: Missing data: our view of the state of the art. Psychol. Methods 7, 147–177 (2002).  https://doi.org/10.1037/1082-989X.7.2.147CrossRefGoogle Scholar
  17. 17.
    Laird, N.M.: Missing data in longitudinal studies. Stat. Med. 7(1–2), 305–315 (1988)CrossRefGoogle Scholar
  18. 18.
    Lee, M., An, J., Lee, Y.: Missing-value imputation of continuous missing based on deep imputation network using correlations among multiple iot data streams in a smart space. IEICE Trans. Inf. Syst. 102(2), 289–298 (2019)CrossRefGoogle Scholar
  19. 19.
    Myers, T.A.: Goodbye, listwise deletion: presenting hot deck imputation as an easy and effective tool for handling missing data. Commun. Methods Measures 5(4), 297–310 (2011)CrossRefGoogle Scholar
  20. 20.
    Nakagawa, S.: Missing data: mechanisms, methods and messages. In: Ecological Statistics: Contemporary Theory and Application, pp. 81–105 (2015)CrossRefGoogle Scholar
  21. 21.
    Oudin, A., Forsberg, B., Adolfsson, A.N., Lind, N., Modig, L., Nordin, M., Nordin, S., Adolfsson, R., Nilsson, L.G.: Traffic-related air pollution and dementia incidence in Northern Sweden: a longitudinal study. Environ. Health Perspect. 124(3), 306–312 (2015)CrossRefGoogle Scholar
  22. 22.
    Pedersen, A.B., Mikkelsen, E.M., Cronin-Fenton, D., Kristensen, N.R., Pham, T.M., Pedersen, L., Petersen, I.: Missing data and multiple imputation in clinical epidemiological research. Clin. Epidemiol. 9, 157 (2017)CrossRefGoogle Scholar
  23. 23.
    Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)MathSciNetzbMATHGoogle Scholar
  24. 24.
    Peixoto, M.L.M., Souza, I., Barbosa, M., Lecomte, G., Batista, B.G., Kuehne, B.T., Filho, D.M.L.: Data missing problem in smart surveillance environment. In: International Conference on High Performance Computing Simulation (HPCS), pp. 962–969, July 2018.  https://doi.org/10.1109/HPCS.2018.00152
  25. 25.
    Phillips, S.J., Anderson, R.P., Schapire, R.E.: Maximum entropy modeling of species geographic distributions. Ecol. Modell. 190(3), 231–259 (2006).  https://doi.org/10.1016/j.ecolmodel.2005.03.026. http://www.sciencedirect.com/science/article/pii/S030438000500267XCrossRefGoogle Scholar
  26. 26.
    Rathore, M.M., Ahmad, A., Paul, A., Rho, S.: Urban planning and building smart cities based on the internet of things using big data analytics. Comput. Netw. 101, 63–80 (2016).  https://doi.org/10.1016/j.comnet.2015.12.023. http://www.sciencedirect.com/science/article/pii/S1389128616000086. Industrial Technologies and Applications for the Internet of ThingsCrossRefGoogle Scholar
  27. 27.
    Santhi, K., Reddy, R.M.: Critical analysis of big visual analytics: a survey. SSRN Electron. J. (2018).  https://doi.org/10.2139/ssrn.3200438
  28. 28.
    Scott, D.W.: Multivariate density estimation and visualization. In: Gentle, J., Härdle, W., Mori, Y. (eds.) Handbook of Computational Statistics, pp. 549–569. Springer, Heidelberg (2012)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Departamento de Investigación y Desarrollo en InformáticaUniversidad Del AzuayCuencaEcuador

Personalised recommendations