Encyclopedia of Big Data Technologies

Living Edition
| Editors: Sherif Sakr, Albert Zomaya


Living reference work entry
DOI: https://doi.org/10.1007/978-3-319-63962-8_216-1



Spatial big data is a spatio-temporal data that is too large or requires data-intensive computation that is too demanding for traditional computing architectures. Stream processing in this context is the processing of spatio-temporal data in motion. The data is observational; it is produced by sensors – moving or otherwise. Computations on the data are made as the data is produced or received. A distributed processing cluster is a networked collection of computers that communicate and process data in a coordinated manner. Computers in the cluster are coordinated to solve a common problem. A lambda architecture is a scalable, fault-tolerant data-processing architecture that is designed to handle large quantities of data by exploiting both stream and batch processing methods. Data partitioninginvolves physically dividing a dataset into separate data stores on a distributed processing cluster. This...

This is a preview of subscription content, log in to check access.


  1. Abel DJ, Ooi BC, Tan K-L, Power R, Yu JX (1995) Spatial join strategies in distributed spatial DBMS. In: Advances in spatial databases – 4th international symposium, SSD’95. Lecture notes in computer science, vol 1619. Springer, Portland, pp 348–367CrossRefGoogle Scholar
  2. Aji A, Wang F, Vo H, Lee R, Liu Q, Zhang X, Saltz J (2013) Hadoop-GIS: a high performance spatial data warehousing system over mapreduce. Proc VLDB Endow 6(11):1009–1020CrossRefGoogle Scholar
  3. Alexander W, Copeland G (1988) Process and dataflow control in distributed data-intensive systems. In: Proceedings of the 1988 ACM SIGMOD international conference on management of data (SIGMOD ’88), pp 90–98.  https://doi.org/10.1145/50202.50212
  4. Apache (2006) Welcome to Apache Hadoop!. http://hadoop.apache.org. Accessed 26 Mar 2018
  5. Brinkhoff T, Kriegel HP, Seeger B (1996) Parallel processing of spatial joins using r-trees. In: Proceedings of the 12th international conference on data engineering, New Orleans, Louisiana, pp 258–265Google Scholar
  6. Chang F, Dean J, Ghemawat S, Hsieh W, Wallach DA, Burrows M, Chandra T, Fikes A, Gruber RE (2008) Bigtable: a distributed storage system for structured data. ACM Trans Comput Syst 26(2).  https://doi.org/10.1145/1365815.1365816CrossRefGoogle Scholar
  7. Chang WY, Abu-Amara H, Sanford JF (2010) Transforming Enterprise Cloud Services. Springer, London, pp 55–56CrossRefGoogle Scholar
  8. Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113.  https://doi.org/10.1145/1327452.1327492CrossRefGoogle Scholar
  9. DeWitt D, Gray J (1992) Parallel database systems: the future of high performance database systems. Commun ACM 35(6).  https://doi.org/10.1145/129888.129894CrossRefGoogle Scholar
  10. DeWitt DJ, Gerber RH, Graefe G, Heytens ML, Kumar KB, Muralikrishna M (1986) GAMMA – a high performance dataflow database machine. In: Proceedings of the 12th international conference on very large data bases (VLDB ’86), Kyoto, Japan, pp 228–237Google Scholar
  11. Du Z, Zhao X, Ye X, Zhou J, Zhang F, Liu R (2017) An effective high-performance multiway spatial join algorithm with spark. ISPRS Int J Geo-Information 6(4):96CrossRefGoogle Scholar
  12. Eldawy A, Mokbel MF (2015) SpatialHadoop: a mapreduce framework for spatial data. In: IEEE 31st international conference on data engineering (ICDE), Seoul, South Korea, pp 1352–1363Google Scholar
  13. Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd international conference on knowledge discovery and data mining (KDD-96), Portland, Oregon, pp 226–231Google Scholar
  14. Garillot F, Maas G (2018) Stream processing with apache spark: best practices for scaling and optimizing Apache spark. O’Reilly Media, Sebastopol. http://shop.oreilly.com/product/0636920047568.do
  15. Gedik B, Andrade H, Wu K-L, Yu PS, Doo M (2008) SPADE: the system s declarative stream processing engine. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data (SIGMOD ’08), pp 1123–1134.  https://doi.org/10.1145/1376616.1376729
  16. Ghemawat S, Gobioff H, Leung S (2003) The Google file system. In: Proceedings of the 19th ACM symposium on operating systems principles, Oct 2003, pp 29–43.  https://doi.org/10.1145/945445.945450
  17. Grossman M, Sarkar, V (2016) SWAT: a programmable, in-memory, distributed, high-performance computing platform. In: Proceedings of the 25th ACM international symposium on high-performance parallel and distributed computing (HPDC ’16). ACM, New York, pp 81–92.  https://doi.org/10.1145/2907294.2907307
  18. Hagedorn S, Götze P, Sattler KU (2017) The STARK framework for spatio-temporal data analytics on spark. In: Proceedings of the 17th conference on database systems for business, technology, and the web (BTW 2017), StuttgartGoogle Scholar
  19. Hassaan M, Elghandour I (2016) A real-time big data analysis framework on a CPU/GPU heterogeneous cluster: a meteorological application case study. In: Proceedings of the 3rd IEEE/ACM international conference on big data computing, applications and technologies (BDCAT ’16). ACM, New York, pp 168–177.  https://doi.org/10.1145/3006299.3006304CrossRefGoogle Scholar
  20. Hong S, Choi W, Jeong W-K (2017) GPU in-memory processing using spark for iterative computation. In: Proceedings of the 17th IEEE/ACM international symposium on cluster, cloud and grid computing (CCGrid ’17), pp 31–41.  https://doi.org/10.1109/CCGRID.2017.41CrossRefGoogle Scholar
  21. Hughes JN, Annex A, Eichelberger CN, Fox A, Hulbert A, Ronquest M (2015) Geomesa: a distributed architecture for spatio-temporal fusion. In: Proceedings of SPIE defense and security.  https://doi.org/10.1117/12.2177233CrossRefGoogle Scholar
  22. Jacox EH, Samet H (2007) Spatial join techniques. ACM Trans Database Syst 32(1):7CrossRefGoogle Scholar
  23. Klein J, Buglak R, Blockow D, Wuttke T, Cooper B (2016) A reference architecture for big data systems in the national security domain. In: Proceedings of the 2nd international workshop on BIG data software engineering (BIGDSE ’16).  https://doi.org/10.1145/2896825.2896834CrossRefGoogle Scholar
  24. Marz N, Warren J (2015) Big data: principles and best practices of scalable realtime data systems, 1st edn. Manning Publications, GreenwichGoogle Scholar
  25. McInnes L, Healy J (2017) Accelerated hierarchical density based clustering. In: IEEE international conference on data mining workshops (ICDMW), New Orleans, Louisiana, pp 33–42Google Scholar
  26. Mysore D, Khupat S, Jain S (2013) Big data architecture and patterns. IBM, White Paper, 2013. http://www.ibm.com/developerworks/library/bdarchpatterns1. Accessed 26 Mar 2018
  27. NoSQL (2009) NoSQL definition. http://nosql-database.org. Accessed 26 Mar 2018
  28. Pavlo A, Aslett M (2016) What’s really new with NewSQL? SIGMOD Rec 45(2):45–55.  https://doi.org/10.1145/3003665.3003674CrossRefGoogle Scholar
  29. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Perrot M, Duchesnay É (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830MathSciNetzbMATHGoogle Scholar
  30. Prasad S, McDermott M, Puri S, Shah D, Aghajarian D, Shekhar S, Zhou X (2015) A vision for GPU-accelerated parallel computation on geo-spatial datasets. SIGSPATIAL Spec 6(3):19–26.  https://doi.org/10.1145/2766196.2766200CrossRefGoogle Scholar
  31. Sakr S, Liu A, Fayoumi AG (2013) The family of mapreduce and large-scale data processing systems. ACM Comput Surv 46(1):1.  https://doi.org/10.1145/2522968.2522979CrossRefGoogle Scholar
  32. Sena B, Allian AP, Nakagawa EY (2017) Characterizing big data software architectures: a systematic mapping study. In: Proceeding of the 11th Brazilian symposium on software components, architectures, and reuse (SBCARS ’17).  https://doi.org/10.1145/3132498.3132510
  33. Shekhar S, Gunturi V, Evans MR, Yang KS. 2012. Spatial big-data challenges intersecting mobility and cloud computing. In: Proceedings of the eleventh ACM international workshop on data engineering for wireless and mobile access (MobiDE ’12), pp 1–6.  https://doi.org/10.1145/2258056.2258058
  34. Shvachko K, Kuang H, Radia S, Chansler R (2010) The Hadoop distributed file system. In: Proceedings of the 2010 IEEE 26th symposium on mass storage systems and technologies (MSST).  https://doi.org/10.1109/MSST.2010.5496972
  35. Sriharsha R (2015) Magellan: geospatial analytics on spark. https://hortonworks.com/blog/magellan-geospatial-analytics-in-spark/. Accessed June 2017
  36. Tang M, Yu Y, Malluhi QM, Ouzzani M, Aref WG (2016) LocationSpark: a distributed in-memory data management system for big spatial data. Proc VLDB Endow 9(13):1565–1568.  https://doi.org/10.14778/3007263.3007310CrossRefGoogle Scholar
  37. Whitman RT, Park MB, Ambrose SM, Hoel EG (2014) Spatial indexing and analytics on Hadoop. In: Proceedings of the 22nd ACM SIGSPATIAL international conference on advances in geographic information systems (SIGSPATIAL ’14), pp 73–82.  https://doi.org/10.1145/2666310.2666387
  38. Whitman RT, Park MB, Marsh BG, Hoel EG (2017) Spatio-temporal join on Apache spark. In: Hoel E, Newsam S, Ravada S, Tamassia R, Trajcevski G (eds) Proceedings of the 25th ACM SIGSPATIAL international conference on advances in geographic information systems (SIGSPATIAL’17).  https://doi.org/10.1145/3139958.3139963
  39. Xie D, Li F, Yao B, Li G, Zhou L, Guo M (2016) Simba: efficient in-memory spatial analytics. In: Proceedings of the 2016 international conference on management of data (SIGMOD ’16), pp 1071–1085.  https://doi.org/10.1145/2882903.2915237
  40. You S, Zhang J, Gruenwald L (2015) Large-scale spatial join query processing in Cloud. In: 2015 31st IEEE international conference on data engineering workshops, Seoul, 13–17 April 2015, pp 34–41Google Scholar
  41. Yu J, Wu J, Sarwat M (2015) Geospark: a cluster computing framework for processing large-scale spatial data. In: Proceedings of the 23rd SIGSPATIAL international conference on advances in geographic information systems, Seattle, WAGoogle Scholar
  42. Yuan Y, Salmi MF, Huai Y, Wang K, Lee R, Zhang X (2016) Spark-GPU: an accelerated in-memory data processing engine on clusters. In: Proceedings of the 2016 IEEE international conference on big data (Big Data 2016), Washington, DC, pp 273–283Google Scholar
  43. Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I (2010) Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX conference on hot topics in cloud computing (HotCloud’10), Boston, MAGoogle Scholar
  44. Zhang S, Han J, Liu Z, Wang K, Xu Z (2009) SJMR: parallelizing spatial join with mapreduce on clusters. In: IEEE international conference on Cluster computing (CLUSTER’09), New Orleans, Louisiana, pp 1–8Google Scholar

Authors and Affiliations

  1. 1.Environmental Systems Research InstituteRedlandsUSA

Section editors and affiliations

  • Timos Sellis
    • 1
  • Aamir Cheema
  1. 1.Data Science Research InstituteSwinburne University of TechnologyMelbourneAustralia