Advertisement

GeoInformatica

, Volume 22, Issue 4, pp 785–813 | Cite as

ST-Hadoop: a MapReduce framework for spatio-temporal data

  • Louai AlarabiEmail author
  • Mohamed F. Mokbel
  • Mashaal Musleh
Article

Abstract

This paper presents ST-Hadoop; the first full-fledged open-source MapReduce framework with a native support for spatio-temporal data. ST-Hadoop is a comprehensive extension to Hadoop and SpatialHadoop that injects spatio-temporal data awareness inside each of their layers, mainly, language, indexing, and operations layers. In the language layer, ST-Hadoop provides built in spatio-temporal data types and operations. In the indexing layer, ST-Hadoop spatiotemporally loads and divides data across computation nodes in Hadoop Distributed File System in a way that mimics spatio-temporal index structures, which result in achieving orders of magnitude better performance than Hadoop and SpatialHadoop when dealing with spatio-temporal data and queries. In the operations layer, ST-Hadoop shipped with support for three fundamental spatio-temporal queries, namely, spatio-temporal range, top-k nearest neighbor, and join queries. Extensibility of ST-Hadoop allows others to extend features and operations easily using similar approaches described in the paper. Extensive experiments conducted on large-scale dataset of size 10 TB that contains over 1 Billion spatio-temporal records, to show that ST-Hadoop achieves orders of magnitude better performance than Hadoop and SpaitalHadoop when dealing with spatio-temporal data and operations. The key idea behind the performance gained in ST-Hadoop is its ability in indexing spatio-temporal data within Hadoop Distributed File System.

Keywords

MapReduce-based systems Spatio-temporal systems Spatio-temporal range query Spatio-temporal nearest neighbor query Spatio-temporal join query 

Notes

References

  1. 1.
  2. 2.
  3. 3.
    Aji A, Wang F, Vo H, Lee R, Liu Q, Zhang X, Saltz J (2013) Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce. In: VLDBGoogle Scholar
  4. 4.
    Al-Naami KM, Seker SE, Khan L (2014) GISQF: An Efficient Spatial Query Processing System. In: CLOUDCOMGoogle Scholar
  5. 5.
    Alarabi L, Mokbel MF, Musleh M (2017) St-hadoop: A mapreduce framework for spatio-temporal data. In: SSTDGoogle Scholar
  6. 6.
    Apache. Hadoop. http://hadoop.apache.org/
  7. 7.
    Apache. Spark. http://spark.apache.org/
  8. 8.
    Eldawy A, Mokbel MF (2014) Pigeon: A spatial mapreduce language. In: ICDEGoogle Scholar
  9. 9.
    Eldawy A, Mokbel MF (2015) SpatialHadoop: A MapReduce Framework for Spatial Data. In: ICDEGoogle Scholar
  10. 10.
    Eldawy A, Mokbel MF, Alharthi S, Alzaidy A, Tarek K, Ghani S (2015) SHAHED: A MapReduce-based System for Querying and Visualizing Spatio-temporal Satellite Data. In: ICDEGoogle Scholar
  11. 11.
    Erwig M, Schneider M (2002) Spatio-temporal predicates. In: TKDEGoogle Scholar
  12. 12.
    European XFEL: The Data Challenge, Sept. 2012. http://www.xfel.eu/news/2012/the_data_challenge
  13. 13.
    Fox AD, Eichelberger CN, Hughes JN, Lyon S (2013) Spatio-temporal indexing in non-relational distributed databases. In: BIGDATAGoogle Scholar
  14. 14.
    Fries S, Boden B, Stepien G, Seidl T (2014) Phidj: Parallel similarity self-join for high-dimensional vector data with mapreduce. In: ICDEGoogle Scholar
  15. 15.
  16. 16.
    Han W, Kim J, Lee BS, Tao Y, Rantzau R, Markl V (2009) Cost-based predictive spatiotemporal joinGoogle Scholar
  17. 17.
    Kini A, Emanuele R Geotrellis: Adding Geospatial Capabilities to Spark, 2014. http://spark-summit.org/2014/talk/geotrellis-adding-geospatial-capabilities-to-spark
  18. 18.
    Li Z, Hu F, Schnase JL, Duffy DQ, Lee T, Bowen MK, Yang C (2016) A spatiotemporal indexing approach for efficient processing of big array-based climate data with mapreduce. IJGISGoogle Scholar
  19. 19.
    Lo M-L, Ravishankar CV (1996) Spatial Hash-joins. In: SIGMODRGoogle Scholar
  20. 20.
    Lu J, Guting RH (2012) Parallel Secondo: Boosting Database Engines with Hadoop. In: ICPADSGoogle Scholar
  21. 21.
    Lu P, Chen G, Ooi BC, Vo HT, Wu S (2014) ScalaGiST: Scalable Generalized Search Trees for MapReduce Systems. PVLDBGoogle Scholar
  22. 22.
    Ma Q, Yang B, Qian W, Zhou A (2009) Query Processing of Massive Trajectory Data Based on MapReduce. In: CLOUDDBGoogle Scholar
  23. 23.
    Land Process Distributed Active Archive Center, Mar. 2015. https://lpdaac.usgs.gov/about
  24. 24.
    Data from NASA’s Missions, Research, and Activities, 2016. http://www.nasa.gov/open/data.html
  25. 25.
    Nishimura S, Das S, Agrawal D, El Abbadi A \(\mathcal {M}\mathcal {D}\)-HBase: Design and Implementation of an Elastic Data Infrastructure for Cloud-scale Location Services. DAPDGoogle Scholar
  26. 26.
    NYC Taxi and Limousine Commission, 2017. http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml
  27. 27.
    Pavlo A, Paulson E, Rasin A, Abadi D, DeWitt D, Madden S, Stonebraker M (2009) A Comparison of Approaches to Large-Scale Data Analysis. In: SIGMODGoogle Scholar
  28. 28.
    ST-Hadoop website. http://st-hadoop.cs.umn.edu/
  29. 29.
    Stonebraker M, Brown P, Zhang D, Becla J (2013) SciDB: A Database Management System for Applications with Complex Analytics. Computing in Science and EngineeringGoogle Scholar
  30. 30.
    Tan H, Luo W, Ni LM (2012) Clost: a hadoop-based storage system for big spatio-temporal data analytics. In: CIKMGoogle Scholar
  31. 31.
    Wang G, Salles M, Sowell B, Wang X, Cao T, Demers A, Gehrke J, White W (2010) Behavioral Simulations in MapReduce. PVLDBGoogle Scholar
  32. 32.
    Whitby MA, Fecher R, Bennight C (2017) Geowave: Utilizing distributed key-value stores for multidimensional data. In: Proceedings of the International Symposium on Advances in Spatial and Temporal Databases, SSTDGoogle Scholar
  33. 33.
    Whitman RT, Park MB, Ambrose SA, Hoel EG (2014) Spatial Indexing and Analytics on Hadoop. In: SIGSPATIALGoogle Scholar
  34. 34.
    Yokoyama T, Ishikawa Y, Suzuki Y (2012) Processing all k-nearest neighbor queries in hadoop. In: WAIMGoogle Scholar
  35. 35.
    Yu J, Wu J, Sarwat M (2015) GeoSpark: A Cluster Computing Framework for Processing Large-Scale Spatial Data. In: SIGSPATIALGoogle Scholar
  36. 36.
    Zhang S, Han J, Liu Z, Wang K, Feng S (2009) Spatial Queries Evaluation with MapReduce. In: GCCGoogle Scholar
  37. 37.
    Zhang X, Ai J, Wang Z, Lu J, Meng X (2009) An efficient multi-dimensional index for cloud data management. In: CIKMGoogle Scholar
  38. 38.
    Zhong Y, Zhu X, Fang J (2012) Elastic and Effective Spatio-Temporal Query Processing Scheme on Hadoop. In: BIGSPATIALGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringUniversity of MinnesotaMinneapolisUSA

Personalised recommendations