Skip to main content

Advertisement

Log in

Efficient spatiotemporal interpolation with spark machine learning

  • Research Article
  • Published:
Earth Science Informatics Aims and scope Submit manuscript

Abstract

To better assess the relationships between environmental exposures and health outcomes, an appropriate spatiotemporal interpolation is critical. Traditional spatiotemporal interpolation methods either consider the spatial and temporal dimensions separately or incorporate both dimensions simultaneously by simply treating time as another dimension in space. Such interpolation results suffer from relatively low accuracy as the true space-time domain is skewed inappropriately and the distance calculation in such domain is not accurate. We employ the efficient k-d tree structure to store spatiotemporal data and adopt several machine learning methods to learn optimal parameters. To overcome the computational difficulty with large data sets, we implement our method on an efficient cluster computing framework – Apache Spark. Real world PM2.5 data sets are utilized to test our implementation and the experimental results demonstrate the computational power of our method, which significantly outperforms the previous work in terms of both speed and accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Appice A, Ciampi A, Malerba D, Guccione P (2013) Using trend clusters for spatiotemporal interpolation of missing data in a sensor network. Journal of Spatial Information Science 2013:119–153

    Google Scholar 

  • Aslam JA, Popa RA, Rivest RL (2007) On estimating the size and confidence of a statistical audit. EVT 7:8–8

    Google Scholar 

  • Breiman L (1994) Bagging predictor. Tech. rep., Department of Statistics, University of California

  • Brunekreef B, Holgate ST (2002) Air pollution and health. The lancet 360:1233–1242

    Article  Google Scholar 

  • Bureau USC (2010) Geographic terms and concepts - block groups. https://www.census.gov/geo/reference/gtc/gtc_bg.html?cssp=SERP

  • Clemons W, Grecol M, Losser T, Yorke C (2013) Monitoring pollution trend in the course of the year using inverse distance weighting spatio-temporal interpolation. Tech. rep., Department of Computer Sciences, Georgia Southern University

  • De Boor C (1978) A practical guide to splines, vol 27. Springer, New York

    Book  Google Scholar 

  • Devijver PA, Kittler J (1982) Pattern recognition: a statistical approach. Prentice Hall, Upper Saddle River

    Google Scholar 

  • Dhatt G, Lefranċois E, Touzot G (2012) Finite element method, Wiley, New York

  • Duncan IG (2011) Healthcare risk adjustment and predictive modeling, Actex Publications, Winsted

  • EPA (2016) Air quality system (aqs). available online: http://www3.epa.gov/pm

  • Friedman JH, Bentley JL, Finkel RA (1977) An algorithm for finding best matches in logarithmic expected time. ACM Trans Math Softw (TOMS) 3:209–226

    Article  Google Scholar 

  • Geisser S (1993) Predictive inference, vol 55. CRC Press, Boca Raton

    Book  Google Scholar 

  • Ghim YS, Moon KC, Lee S, Kim YP (2005) Visibility trends in Korea during the past two decades. J Air Waste Manag Assoc 55:73–82

    Article  Google Scholar 

  • Gräler B, Rehr M, Gerharz L, Pebesma E (2013) Spatio-temporal analysis and interpolation of PM10, measurements in Europe for 2009

  • Hong YC, Lee JT, Kim H, Ha EH, Schwartz J, Christiani DC (2002) Effects of air pollutants on acute stroke mortality. Environ Health Perspect 110:187

    Article  Google Scholar 

  • Iceland J, Steinmetz E (2003) The effects of using census block groups instead of census tracts when examining residential housing patterns. US Census Bureau, Washington DC

  • Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Ijcai, vol. 14, pp. 1137–1145

  • Krall JR, Anderson GB, Dominici F, Bell ML, Peng RD (2013) Short-term exposure to particulate matter constituents and mortality in a national study of US urban communities

  • Krieger N, Chen JT, Waterman PD, Soobader MJ, Subramanian S, Carson R (2002) Geocoding and monitoring of us socioeconomic inequalities in mortality and cancer incidence: does the choice of area-based measure and geographic level matter? the public health disparities geocoding project. Am J Epidemiol 156:471– 482

    Article  Google Scholar 

  • Krige DG (1951) A statistical approach to some mine valuations and allied problems at the witwatersrand. Master thesis

  • Künzli N, Kaiser R, Medina S, Studnicka M, Chanel O, Filliger P, Herry M, Horak F, J, Puybonnieux-Texier V, Quénel P, Schneider J, Seethaler R, Vergnaud JC, Sommer H (2000) Public-health impact of outdoor and traffic-related air pollution: a european assessment. The Lancet 356:795–801

    Article  Google Scholar 

  • Laden F, Neas LM, Dockery DW, Schwartz J (2000) Association of fine particulate matter from different sources with daily mortality in six U.S. cities. Environ Health Perspect 108:941–947

    Article  Google Scholar 

  • Li L, Losser T, Yorke C, Piltner R (2014) Fast inverse distance weighting-based spatiotemporal interpolation: a web-based application of interpolating daily fine particulate matter PM2.5 in the contiguous US using parallel programming and k-d tree. International journal of environmental research and public health 11:9101–9141

    Article  Google Scholar 

  • Li L, Revesz P (2004) Interpolation methods for spatio-temporal geographic data. Comput Environ Urban Syst 28:201–227

    Article  Google Scholar 

  • Li L, Tian J, Zhang X, Holt JB, Piltner R (2012) Estimating population exposure to fine particulate matter in the conterminous us using shape function-based spatiotemporal interpolation method: a county level analysis. GSTF international journal on computing 1:24–30

    Google Scholar 

  • Liao D, Peuquet DJ, Duan Y, Whitsel EA, Dou J, Smith RL, Lin HM, Chen JC, Heiss G (2006) Gis approaches for the estimation of residential-level ambient pm concentrations. Environ Health Perspect 114(9):1374–1380

    Article  Google Scholar 

  • Losser T, Li L, Piltner R (2014) A spatiotemporal interpolation method using radial basis functions for geospatiotemporal big data. In: COM.Geo, pp 17–24

  • Lu P, Abedi V, Mei Y, Hontecillas R, Hoops S, Carbo A, Bassaganya-Riera J (2015) Supervised learning methods in modeling of CD4+ t cell heterogeneity. BioData mining 8:1

    Article  Google Scholar 

  • McLachlan G, Do KA, Ambroise C (2005) Analyzing microarray gene expression data, vol 422. Wiley, New York

    Google Scholar 

  • Pagowski M, Grell GA, McKeen SA, Peckham SE, Devenyi D (2010) Three-dimensional variational data assimilation of ozone and fine particulate matter observations: some results using the weather research and forecasting—chemistry model and grid-point statistical interpolation. Q J R Meteorol Soc 136:2013–2024

    Article  Google Scholar 

  • Pebesma E (2012) spacetime: Spatio-temporal data in R. J Stat Softw 51:1–30

    Article  Google Scholar 

  • Pope CA, Burnett RT, Thurston GD, Thun MJ, Calle EE, Krewski D, Godleski JJ (2004) Cardiovascular mortality and long-term exposure to particulate air pollution epidemiological evidence of general pathophysiological pathways of disease. Circulation 109:71–77

    Article  Google Scholar 

  • Pope CA III, Dockery DW (2006) Health effects of fine particulate air pollution: lines that connect. J Air Waste Manag Assoc 56:709–742

    Article  Google Scholar 

  • Seaton A, Godden D, MacNee W, Donaldson K (1995) Particulate air pollution and acute health effects. The Lancet 345:176–178

    Article  Google Scholar 

  • Shepard D (1968) A two-dimensional interpolation function for irregularly-spaced data. In: Proceedings of the 23rd ACM national conference, pp 517–524

  • Sloane CS, Watson J, Chow J, Pritchett L, Richards LW (1991) Size-segregated fine particle measurements by chemical species and their impact on visibility impairment in denver. Atmos Environ Part A General Topics 25:1013–1024

    Article  Google Scholar 

  • Spark A (2016) Apache spark is a fast and general engine for large-scale data processing

  • Tobler WR (1970) A computer movie simulating urban growth in the detroit region. Econ Geogr 46:234–240

    Article  Google Scholar 

  • Valentini V, Schmoll HJ, van de Velde CJH (2012) Multidisciplinary management of rectal cancer: questions and answers. Springer, Berlin

    Book  Google Scholar 

  • Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I (2010) Spark: cluster computing with working sets. HotCloud 10:10–10

    Google Scholar 

  • Zakir J, Seymour T, Berg K (2015) Big data analytics. Issues in Information Systems 16(II):81–90

    Google Scholar 

  • Zanobetti A, Schwartz J (2009) The effect of fine and coarse particulate air pollution on mortality: a national analysis. Environ Health Perspect 117:898–903

    Article  Google Scholar 

  • Zhu Z, Yuan D, Luo D, Lu X, Huang S (2015) Enrichment of minor alleles of common snps and improved risk prediction for parkinson’s disease. PloS one 10:e0133,421

    Article  Google Scholar 

  • Zurflueh EG (1967) Applications of two-dimensional linear wavelength filtering. Geophysics 32:1015–1035

    Article  Google Scholar 

Download references

Acknowledgements

We would like to thank Brandon Kimmons, Director of Computational Research Technical Support at Georgia Southern University, for helping us set up Spark. Franklin, Tong and Zhou were supported in part by funds from the Office of the Vice President for Research & Economic Development at Georgia Southern University. Beseny, Franklin, Li, Tong were supported in part by cooperative engineering and health sciences faculty seed grants from Allen E. Paulson College of Engineering & Information Technology, Georgia Southern University, and the College of Allied Health Sciences, Augusta University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Weitian Tong.

Additional information

Communicated by: H. A. Babaie

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tong, W., Li, L., Zhou, X. et al. Efficient spatiotemporal interpolation with spark machine learning. Earth Sci Inform 12, 87–96 (2019). https://doi.org/10.1007/s12145-018-0364-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12145-018-0364-4

Keywords

Navigation