Abstract
Recent years have seen the emergence of many new and valuable spatial datasets such as trajectories of cell-phones and Global Positioning System (GPS) devices, vehicle engine measurements, global climate models simulation data, volunteered geographic information (VGI), geo-social media, and tweets. The value of these datasets is already evident through many societal applications including disaster management and disease outbreak prediction. However, these location-aware datasets are of a volume, variety, and velocity that exceed the capability of current CyberGIS technologies. We refer to these datasets as Spatial Big Data. In this chapter, we define spatial big data in terms of its value proposition and user experience which depends on the computational platform, use-case, and dataset at hand. We compare spatial big data with traditional spatial data and with other types of big data. We then provide an overview of the current efforts, challenges and opportunities available when spatial big data is enabled via next-generation CyberGIS. Our discussion includes current accomplishments and opportunities from both an analytics and an infrastructure perspective.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Agrawal R, Srikant R et al (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th international conference on very large data bases, VLDB, vol 1215, pp 487–499
Aji A, Wang F, Vo H, Lee R, Liu Q, Zhang X, Saltz J (2013) Hadoop gis: a high performance spatial data warehousing system over mapreduce. Proc VLDB Endow 6(11):1009–1020
Ali RY, Gunturi VMV, Kotz AJ, Shekhar S, Northrop WF (2015) Discovering non-compliant window co-occurrence patterns: a summary of results. In: International symposium on spatial and temporal databases. Springer, pp 391–410
Ali RY, Gunturi VMV, Kotz AJ, Eftelioglu E, Shekhar S, Northrop WF (2017) Discovering non-compliant window co-occurrence patterns. GeoInformatica 21(4):829–866. https://doi.org/10.1007/s10707-016-0289-3
American Transportation Research Institute (ATRI) (2010a) ATRI and FHWA release bottleneck analysis of 100 freight significant highway locations. http://goo.gl/C0NuD. Accessed 1 July 2013
American Transportation Research Institute (ATRI) (2010b) FPM congestion monitoring at 250 freight significant highway location: final results of the 2010 performance assessment. http://goo.gl/3cAjr. Accessed 1 July 2013
Apache (n.d.) Apache spark. http://spark.incubator.apache.org/. Accessed 5 Feb 2014
Bailey T, Gatrell A (1995) Interactive spatial data analysis, vol 413. Longman Scientific & Technical Essex
Bolstad P (2005) GIS fundamentals: a first text on geographic information systems. Eider Pr
Borthakur D (2007) The hadoop distributed file system: architecture and design. Hadoop Project Website 11:21
Brown A (2011) Transportation energy futures: addressing key gaps and providing tools for decision makers. Technical report, National Renewable Energy Laboratory
Brunsdon C, Fotheringham A, Charlton M (1996) Geographically weighted regression: a method for exploring spatial nonstationarity. Geogr Anal 28(4):281–298
Bu Y, Howe B, Balazinska M, Ernst MD (2010) Haloop: efficient iterative data processing on large clusters. Proc VLDB Endow 3(1–2):285–296
Capps G, Franzese O, Knee B, Lascurain M, Otaduy P (2008) Class-8 heavy truck duty cycle project final report. ORNL/TM-2008/122
Chapman B, Jost G, Van Der Pas R (2008) Using OpenMP: portable shared memory parallel programming, vol 10. The MIT Press
Chawla S, Shekhar S, Wu WL, AHPCRC, University of Minnesota (2000) Modeling spatial dependencies for mining geospatial data: an introduction. Army High Performance Computing Research Center
Cohen J (2009) Graph twiddling in a mapreduce world. Comput Sci Eng 11(4):29–41
Cressie N (1992) Statistics for spatial data. Terra Nova 4(5):613–617
Davis S, Diegel S, Boundy R (2010) Transportation energy data book: Edition 28. Technical report, Oak Ridge National Laboratory
Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Drew C (2010) Military is awash in data from drones. Available via New York Times. http://www.nytimes.com/2010/01/11/business/11drone.html?pagewanted=all. Accessed 1 Mar 2017
Eldawy A, Mokbel M (2013) Spatial hadoop. http://spatialhadoop.cs.umn.edu/, Accessed 9 Oct 2013
Facebook Inc (n.d.) Facebook check-in. https://www.facebook.com/about/location. Accessed 6 Feb 2014
Federal Highway Administration (2008) Highway statistics. HM-63, HM-64
Fotheringham A, Brunsdon C, Charlton M (2002) Geographically weighted regression: the analysis of spatially varying relationships. John Wiley & Sons Inc
Gallagher S (2013) How google built a 52-terapixel time-lapse portrait of earth. https://arstechnica.com/information-technology/2013/06/how-google-built-a-108-terapixel-time-lapse-portrait-of-earth/. Accessed 1 Mar 2017
Garmin (1996) http://www.garmin.com/us/. Accessed 1 Mar 2017
GEOGLAM (2017) Crop monitor: a geoglam initiative. www.geoglam-crop-monitor.org. Accessed 1 Mar 2017
George B, Shekhar S (2008) Road maps, digital. In: Encyclopedia of GIS. Springer, pp 967–972
Ghemawat S, Gobioff H, Leung S (2003) The google file system. In: ACM SIGOPS operating systems review, vol 37. ACM, pp 29–43
Ginsberg J, Mohebbi MH, Patel RS, Brammer L, Smolinski MS, Brilliant L (2008) Detecting influenza epidemics using search engine query data. Nature 457(7232):1012–1014
Gonzalez JE, Low Y, Gu H, Bickson D, Guestrin C (2012) Powergraph: distributed graph-parallel computation on natural graphs. In: Proceedings of the 10th USENIX symposium on operating systems design and implementation (OSDI), pp 17–30
Google Inc (n.d.a) Google earth. https://www.google.com/earth/. Accessed 1 Mar 2017
Google Inc (n.d.b) Google earth engine. https://earthengine.google.com/. Accessed 1 Mar 2017
Google Inc (n.d.c) Google maps. http://maps.google.com. Accessed 1 Mar 2017
Google Inc (n.d.d) Timelapse. https://earthengine.google.com/timelapse/. Accessed 1 Mar 2017
Guan Q, Kyriakidis PC, Goodchild MF (2011) A parallel computing approach to fast geostatistical areal interpolation. Int J Geogr Inf Sci 25(8):1241–1267
Guibert G, Ewing J, Russell K, Watkins D (2016) How volkswagen is grappling with its diesel scandal. Available via New York Times. https://goo.gl/gZNEUA. Accessed 1 Mar 2017
Gunturi VMV, Nunes E, Yang K, Shekhar S (2011) A critical-time-point approach to all-start-time lagrangian shortest paths: a summary of results. In: Advances in spatial and temporal databases, LNCS 6849. Springer, pp 74–91
InformationWeek (2012) Red cross unveils social media monitoring operation. http://www.informationweek.com/government/information-management/red-cross-unveils-social-media-monitorin/232602219
Intel (2013) Intel distribution for apache hadoop software. http://www.intel.com/content/dam/www/public/us/en/documents/articles/intel-distribution-for-apache-hadoop-product-brief.pdf. Accessed 1 Mar 2017
Kang U, Tsourakakis C, Faloutsos C (2009) Pegasus: a peta-scale graph mining system implementation and observations. In: Ninth IEEE international conference on data mining (ICDM 2009). IEEE, pp 229–238
Kargupta H, Puttagunta V, Klein M, Sarkar K (2006) On-board vehicle data stream monitoring using minefleet and fast resource constrained monitoring of correlation matrices. New Gener Comput 25(1):5–32. Springer
Kargupta H, Gama J, Fan W (2010) The next generation of transportation systems, greenhouse emissions, and data mining. In: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 1209–1212
Kazar B, Shekhar S, Lilja D, Boley D (2004) A parallel formulation of the spatial auto-regression model for mining large geo-spatial datasets. In: SIAM international conference on data mining workshop on high performance and distributed mining (HPDM2004), Citeseer
Levchuk G, Bobick A, Jones E (2010) Activity and function recognition for moving and static objects in urban environments from wide-area persistent surveillance inputs. In: Proceedings of SPIE, evolutionary and bio-inspired computation: theory and applications IV, vol 7704
Liu Y, Wu K, Wang S, Zhao Y, Huang Q (2010) A mapreduce approach to g i*(d) spatial statistic. In: Proceedings of the ACM SIGSPATIAL international workshop on high performance and distributed geographic information systems. ACM, pp 11–18
Lovell J (2007) Left-hand-turn elimination. Available via New York Times. http://goo.gl/3bkPb. Accessed 1 Mar 2017
Low Y, Gonzalez J, Kyrola A, Bickson D, Guestrin C, Hellerstein JM (2010) Graphlab: a new framework for parallel machine learning. arXiv preprint arXiv:10064990
Lynx GIS (n.d.) http://www.lynxgis.com/. Accessed 1 Mar 2017
Malewicz G, Austern M, Bik A, Dehnert J, Horn I, Leiser N, Czajkowski G (2010) Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 international conference on management of data. ACM, pp 135–146
Manyika J et al (2011) Big data: the next frontier for innovation, competition and productivity. McKinsey Global Institute, May
MasterNaut (2015) Green solutions. http://www.masternaut.co.uk/carbon-calculator/. Accessed 20 Nov 2015
Murray DG, Schwarzkopf M, Smowton C, Smith S, Madhavapeddy A, Hand S (2011) Ciel: a universal execution engine for distributed data-flow computing. In: Proceedings of the 8th USENIX conference on networked systems design and implementation, p 9
National Academies of Sciences, Engineering, and Medicine, (2007) Successful response starts with a map: improving geospatial support for disaster management. The National Academies Press, Washington, D.C
NAVTEQ (2016) https://here.com/en/navteq. Accessed 1 Mar 2017
OCED (2014) The cost of air pollution. https://doi.org/10.1787/9789264210448-en. Accessed 1 Mar 2017
Pang LX, Chawla S, Scholz B, Wilcox G (2013) A scalable approach for LRT computation in GPGPU environments. In: Web technologies and applications. Springer, pp 595–608
Prasad SK, Shekhar S, He X, Puri S, McDermott M, Zhou X, Evans M (2013a) GPGPU-based data structures and algorithms for geospatial computation a summary of results and future roadmap. Position paper. In: Proceedings of the all hands meeting of the NSF CyberGIS project Seattle
Prasad SK, Shekhar S, McDermott M, Zhou X, Evans M, Puri S (2013b) GPGPU-accelerated interesting interval discovery and other computations on geospatial datasets–a summary of results. In: Proceedings of the 2nd ACM SIGSPATIAL international workshop on analytics for big geospatial data (BigSpatial-2013). ACM
Reed DA, Dongarra J (2015) Exascale computing and big data. Commun ACM 58(7):56–68
Regalado A (2010) New Google Earth Engine. Available via ScienceInsider. http://news.sciencemag.org/technology/2010/12/new-google-earth-engine. Accessed 1 Mar 2017
Shekhar S, Chawla S (2003) Spatial databases: a tour. Prentice Hall
Shekhar S, Xiong H (2007) Encyclopedia of GIS. Springer Publishing Company, Incorporated
Shekhar S, Ravada S, Kumar V, Chubb D, Turner G (1996) Parallelizing a gis on a shared address space architecture. Computer 29(12):42–48
Shekhar S, Ravada S, Chubb D, Turner G (1998) Declustering and load-balancing methods for parallelizing geographic information systems. IEEE Trans Knowl Data Eng 10(4):632–655
Shekhar S, Schrater P, Vatsavai R, Wu W, Chawla S (2002) Spatial contextual classification and prediction models for mining geospatial data. IEEE Trans Multimed 4(2):174–188. IEEE Computer Society
Shekhar S, Evans M, Kang J, Mohan P (2011) Identifying patterns in spatial information: a survey of methods. Wiley Interdisc Rev Data Min Knowl Discov 1(3):193–214
Shekhar S, Gunturi V, Evans MR, Yang K (2012) Spatial big-data challenges intersecting mobility and cloud computing. In: Proceedings of the eleventh ACM international workshop on data engineering for wireless and mobile access. ACM, pp 1–6
Shekhar S, Evans MR, Gunturi V, Yang K, Cugler DC (2014) Benchmarking spatial big data. In: Specifying big data benchmarks. Springer, pp 81–93
Shi X, Ye F (2013) Kriging interpolation over heterogeneous computer architectures and systems. GISci Remote Sens 50(2):196–211
Soble J (2016) Mitsubishi admits cheating on fuel-economy tests. Available via New York Times. https://goo.gl/zkKBpn. Accessed 1 Mar 2017
Sperling D, Gordon D (2009) Two billion cars. Oxford University Press
TeleNav (2014) http://www.telenav.com/. Accessed 1 Mar 2017
TeloGIS (2017) http://www.telogis.com/. Accessed 1 Mar 2017
The Millennium Project (2014) Global challenges for humanity. http://www.millennium-project.org/millennium/challenges.html. Accessed 7 Feb 2014
TomTom (2011) TomTom GPS navigation. http://www.tomtom.com/. Accessed 1 Mar 2017
US Congress (2007) Energy independence and security act of 2007. Public Law (110–140). https://www.gpo.gov/fdsys/pkg/PLAW-110publ140/html/PLAW-110publ140.htm. Accessed 1 Mar 2017
US Energy Information Adminstration (2011) Monthly energy review June 2011. http://www.eia.gov/totalenergy/data/monthly/. Accessed 24 Feb 2011
US Environmental Protection Agency (n.d.) Sources of greenhouse gas emissions. https://www.epa.gov/ghgemissions/sources-greenhouse-gas-emissions. Accessed 1 Mar 2017
Vatsavai RR, Ganguly A, Chandola V, Stefanidis A, Klasky S, Shekhar S (2012) Spatiotemporal data mining in the era of big spatial data: algorithms and applications. In: Proceedings of the 1st ACM SIGSPATIAL international workshop on analytics for big geospatial data. ACM, pp 1–10
Wang S (2010) A cybergis framework for the synthesis of cyberinfrastructure, GIS, and spatial analysis. Ann Assoc Am Geogr 100(3):535–557
Waze Mobile (2006) http://www.waze.com/. Accessed 1 Mar 2017
White J (2014) U.S. fines hyundai, kia for fuel claims. Available via The Wall Journal. https://goo.gl/7C0ZMj. Accessed 1 Mar 2017
Wikipedia (2011) Usage-based insurance—wikipedia, the free encyclopedia. http://goo.gl/NqJE5. Accessed 15 Dec 2011
Willford JN (2010) Mapping ancient civilization, in a matter of days. Available via New York Times. http://www.nytimes.com/2010/05/11/science/11maya.html. Accessed 1 Mar 2017
Yu J, Wu J, Sarwat M (2015) Geospark: a cluster computing framework for processing large-scale spatial data. In: Proceedings of the 23rd SIGSPATIAL international conference on advances in geographic information systems. ACM, p 70
Zhang Y, Gao Q, Gao L, Wang C (2011) Priter: a distributed framework for prioritized iterative computations. In: Proceedings of the 2nd ACM symposium on cloud computing. ACM, p 13
Zhou X, Shekhar S, Mohan P, Liess S, Snyder PK (2011) Discovering interesting sub-paths in spatiotemporal datasets: a summary of results. In: Proceedings of the 19th ACM SIGSPATIAL international conference on advances in geographic information systems. ACM, pp 44–53
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Science+Business Media B.V., part of Springer Nature
About this chapter
Cite this chapter
Evans, M.R., Oliver, D., Yang, K., Zhou, X., Ali, R.Y., Shekhar, S. (2019). Enabling Spatial Big Data via CyberGIS: Challenges and Opportunities. In: Wang, S., Goodchild, M. (eds) CyberGIS for Geospatial Discovery and Innovation. GeoJournal Library, vol 118. Springer, Dordrecht. https://doi.org/10.1007/978-94-024-1531-5_8
Download citation
DOI: https://doi.org/10.1007/978-94-024-1531-5_8
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-024-1529-2
Online ISBN: 978-94-024-1531-5
eBook Packages: Earth and Environmental ScienceEarth and Environmental Science (R0)