Skip to main content

Enabling Spatial Big Data via CyberGIS: Challenges and Opportunities

  • Chapter
  • First Online:
CyberGIS for Geospatial Discovery and Innovation

Part of the book series: GeoJournal Library ((GEJL,volume 118))

Abstract

Recent years have seen the emergence of many new and valuable spatial datasets such as trajectories of cell-phones and Global Positioning System (GPS) devices, vehicle engine measurements, global climate models simulation data, volunteered geographic information (VGI), geo-social media, and tweets. The value of these datasets is already evident through many societal applications including disaster management and disease outbreak prediction. However, these location-aware datasets are of a volume, variety, and velocity that exceed the capability of current CyberGIS technologies. We refer to these datasets as Spatial Big Data. In this chapter, we define spatial big data in terms of its value proposition and user experience which depends on the computational platform, use-case, and dataset at hand. We compare spatial big data with traditional spatial data and with other types of big data. We then provide an overview of the current efforts, challenges and opportunities available when spatial big data is enabled via next-generation CyberGIS. Our discussion includes current accomplishments and opportunities from both an analytics and an infrastructure perspective.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Agrawal R, Srikant R et al (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th international conference on very large data bases, VLDB, vol 1215, pp 487–499

    Google Scholar 

  • Aji A, Wang F, Vo H, Lee R, Liu Q, Zhang X, Saltz J (2013) Hadoop gis: a high performance spatial data warehousing system over mapreduce. Proc VLDB Endow 6(11):1009–1020

    Article  Google Scholar 

  • Ali RY, Gunturi VMV, Kotz AJ, Shekhar S, Northrop WF (2015) Discovering non-compliant window co-occurrence patterns: a summary of results. In: International symposium on spatial and temporal databases. Springer, pp 391–410

    Chapter  Google Scholar 

  • Ali RY, Gunturi VMV, Kotz AJ, Eftelioglu E, Shekhar S, Northrop WF (2017) Discovering non-compliant window co-occurrence patterns. GeoInformatica 21(4):829–866. https://doi.org/10.1007/s10707-016-0289-3

    Article  Google Scholar 

  • American Transportation Research Institute (ATRI) (2010a) ATRI and FHWA release bottleneck analysis of 100 freight significant highway locations. http://goo.gl/C0NuD. Accessed 1 July 2013

  • American Transportation Research Institute (ATRI) (2010b) FPM congestion monitoring at 250 freight significant highway location: final results of the 2010 performance assessment. http://goo.gl/3cAjr. Accessed 1 July 2013

  • Apache (n.d.) Apache spark. http://spark.incubator.apache.org/. Accessed 5 Feb 2014

  • Bailey T, Gatrell A (1995) Interactive spatial data analysis, vol 413. Longman Scientific & Technical Essex

    Google Scholar 

  • Bolstad P (2005) GIS fundamentals: a first text on geographic information systems. Eider Pr

    Google Scholar 

  • Borthakur D (2007) The hadoop distributed file system: architecture and design. Hadoop Project Website 11:21

    Google Scholar 

  • Brown A (2011) Transportation energy futures: addressing key gaps and providing tools for decision makers. Technical report, National Renewable Energy Laboratory

    Google Scholar 

  • Brunsdon C, Fotheringham A, Charlton M (1996) Geographically weighted regression: a method for exploring spatial nonstationarity. Geogr Anal 28(4):281–298

    Article  Google Scholar 

  • Bu Y, Howe B, Balazinska M, Ernst MD (2010) Haloop: efficient iterative data processing on large clusters. Proc VLDB Endow 3(1–2):285–296

    Article  Google Scholar 

  • Capps G, Franzese O, Knee B, Lascurain M, Otaduy P (2008) Class-8 heavy truck duty cycle project final report. ORNL/TM-2008/122

    Google Scholar 

  • Chapman B, Jost G, Van Der Pas R (2008) Using OpenMP: portable shared memory parallel programming, vol 10. The MIT Press

    Google Scholar 

  • Chawla S, Shekhar S, Wu WL, AHPCRC, University of Minnesota (2000) Modeling spatial dependencies for mining geospatial data: an introduction. Army High Performance Computing Research Center

    Google Scholar 

  • Cohen J (2009) Graph twiddling in a mapreduce world. Comput Sci Eng 11(4):29–41

    Article  Google Scholar 

  • Cressie N (1992) Statistics for spatial data. Terra Nova 4(5):613–617

    Article  Google Scholar 

  • Davis S, Diegel S, Boundy R (2010) Transportation energy data book: Edition 28. Technical report, Oak Ridge National Laboratory

    Google Scholar 

  • Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113

    Article  Google Scholar 

  • Drew C (2010) Military is awash in data from drones. Available via New York Times. http://www.nytimes.com/2010/01/11/business/11drone.html?pagewanted=all. Accessed 1 Mar 2017

  • Eldawy A, Mokbel M (2013) Spatial hadoop. http://spatialhadoop.cs.umn.edu/, Accessed 9 Oct 2013

  • Facebook Inc (n.d.) Facebook check-in. https://www.facebook.com/about/location. Accessed 6 Feb 2014

  • Federal Highway Administration (2008) Highway statistics. HM-63, HM-64

    Google Scholar 

  • Fotheringham A, Brunsdon C, Charlton M (2002) Geographically weighted regression: the analysis of spatially varying relationships. John Wiley & Sons Inc

    Google Scholar 

  • Gallagher S (2013) How google built a 52-terapixel time-lapse portrait of earth. https://arstechnica.com/information-technology/2013/06/how-google-built-a-108-terapixel-time-lapse-portrait-of-earth/. Accessed 1 Mar 2017

  • Garmin (1996) http://www.garmin.com/us/. Accessed 1 Mar 2017

  • GEOGLAM (2017) Crop monitor: a geoglam initiative. www.geoglam-crop-monitor.org. Accessed 1 Mar 2017

  • George B, Shekhar S (2008) Road maps, digital. In: Encyclopedia of GIS. Springer, pp 967–972

    Chapter  Google Scholar 

  • Ghemawat S, Gobioff H, Leung S (2003) The google file system. In: ACM SIGOPS operating systems review, vol 37. ACM, pp 29–43

    Article  Google Scholar 

  • Ginsberg J, Mohebbi MH, Patel RS, Brammer L, Smolinski MS, Brilliant L (2008) Detecting influenza epidemics using search engine query data. Nature 457(7232):1012–1014

    Article  Google Scholar 

  • Gonzalez JE, Low Y, Gu H, Bickson D, Guestrin C (2012) Powergraph: distributed graph-parallel computation on natural graphs. In: Proceedings of the 10th USENIX symposium on operating systems design and implementation (OSDI), pp 17–30

    Google Scholar 

  • Google Inc (n.d.a) Google earth. https://www.google.com/earth/. Accessed 1 Mar 2017

  • Google Inc (n.d.b) Google earth engine. https://earthengine.google.com/. Accessed 1 Mar 2017

  • Google Inc (n.d.c) Google maps. http://maps.google.com. Accessed 1 Mar 2017

  • Google Inc (n.d.d) Timelapse. https://earthengine.google.com/timelapse/. Accessed 1 Mar 2017

  • Guan Q, Kyriakidis PC, Goodchild MF (2011) A parallel computing approach to fast geostatistical areal interpolation. Int J Geogr Inf Sci 25(8):1241–1267

    Article  Google Scholar 

  • Guibert G, Ewing J, Russell K, Watkins D (2016) How volkswagen is grappling with its diesel scandal. Available via New York Times. https://goo.gl/gZNEUA. Accessed 1 Mar 2017

  • Gunturi VMV, Nunes E, Yang K, Shekhar S (2011) A critical-time-point approach to all-start-time lagrangian shortest paths: a summary of results. In: Advances in spatial and temporal databases, LNCS 6849. Springer, pp 74–91

    Chapter  Google Scholar 

  • InformationWeek (2012) Red cross unveils social media monitoring operation. http://www.informationweek.com/government/information-management/red-cross-unveils-social-media-monitorin/232602219

  • Intel (2013) Intel distribution for apache hadoop software. http://www.intel.com/content/dam/www/public/us/en/documents/articles/intel-distribution-for-apache-hadoop-product-brief.pdf. Accessed 1 Mar 2017

  • Kang U, Tsourakakis C, Faloutsos C (2009) Pegasus: a peta-scale graph mining system implementation and observations. In: Ninth IEEE international conference on data mining (ICDM 2009). IEEE, pp 229–238

    Google Scholar 

  • Kargupta H, Puttagunta V, Klein M, Sarkar K (2006) On-board vehicle data stream monitoring using minefleet and fast resource constrained monitoring of correlation matrices. New Gener Comput 25(1):5–32. Springer

    Article  Google Scholar 

  • Kargupta H, Gama J, Fan W (2010) The next generation of transportation systems, greenhouse emissions, and data mining. In: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 1209–1212

    Google Scholar 

  • Kazar B, Shekhar S, Lilja D, Boley D (2004) A parallel formulation of the spatial auto-regression model for mining large geo-spatial datasets. In: SIAM international conference on data mining workshop on high performance and distributed mining (HPDM2004), Citeseer

    Google Scholar 

  • Levchuk G, Bobick A, Jones E (2010) Activity and function recognition for moving and static objects in urban environments from wide-area persistent surveillance inputs. In: Proceedings of SPIE, evolutionary and bio-inspired computation: theory and applications IV, vol 7704

    Google Scholar 

  • Liu Y, Wu K, Wang S, Zhao Y, Huang Q (2010) A mapreduce approach to g i*(d) spatial statistic. In: Proceedings of the ACM SIGSPATIAL international workshop on high performance and distributed geographic information systems. ACM, pp 11–18

    Google Scholar 

  • Lovell J (2007) Left-hand-turn elimination. Available via New York Times. http://goo.gl/3bkPb. Accessed 1 Mar 2017

  • Low Y, Gonzalez J, Kyrola A, Bickson D, Guestrin C, Hellerstein JM (2010) Graphlab: a new framework for parallel machine learning. arXiv preprint arXiv:10064990

  • Lynx GIS (n.d.) http://www.lynxgis.com/. Accessed 1 Mar 2017

  • Malewicz G, Austern M, Bik A, Dehnert J, Horn I, Leiser N, Czajkowski G (2010) Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 international conference on management of data. ACM, pp 135–146

    Google Scholar 

  • Manyika J et al (2011) Big data: the next frontier for innovation, competition and productivity. McKinsey Global Institute, May

    Google Scholar 

  • MasterNaut (2015) Green solutions. http://www.masternaut.co.uk/carbon-calculator/. Accessed 20 Nov 2015

  • Murray DG, Schwarzkopf M, Smowton C, Smith S, Madhavapeddy A, Hand S (2011) Ciel: a universal execution engine for distributed data-flow computing. In: Proceedings of the 8th USENIX conference on networked systems design and implementation, p 9

    Google Scholar 

  • National Academies of Sciences, Engineering, and Medicine, (2007) Successful response starts with a map: improving geospatial support for disaster management. The National Academies Press, Washington, D.C

    Google Scholar 

  • NAVTEQ (2016) https://here.com/en/navteq. Accessed 1 Mar 2017

  • OCED (2014) The cost of air pollution. https://doi.org/10.1787/9789264210448-en. Accessed 1 Mar 2017

  • Pang LX, Chawla S, Scholz B, Wilcox G (2013) A scalable approach for LRT computation in GPGPU environments. In: Web technologies and applications. Springer, pp 595–608

    Chapter  Google Scholar 

  • Prasad SK, Shekhar S, He X, Puri S, McDermott M, Zhou X, Evans M (2013a) GPGPU-based data structures and algorithms for geospatial computation a summary of results and future roadmap. Position paper. In: Proceedings of the all hands meeting of the NSF CyberGIS project Seattle

    Google Scholar 

  • Prasad SK, Shekhar S, McDermott M, Zhou X, Evans M, Puri S (2013b) GPGPU-accelerated interesting interval discovery and other computations on geospatial datasets–a summary of results. In: Proceedings of the 2nd ACM SIGSPATIAL international workshop on analytics for big geospatial data (BigSpatial-2013). ACM

    Google Scholar 

  • Reed DA, Dongarra J (2015) Exascale computing and big data. Commun ACM 58(7):56–68

    Article  Google Scholar 

  • Regalado A (2010) New Google Earth Engine. Available via ScienceInsider. http://news.sciencemag.org/technology/2010/12/new-google-earth-engine. Accessed 1 Mar 2017

  • Shekhar S, Chawla S (2003) Spatial databases: a tour. Prentice Hall

    Google Scholar 

  • Shekhar S, Xiong H (2007) Encyclopedia of GIS. Springer Publishing Company, Incorporated

    Google Scholar 

  • Shekhar S, Ravada S, Kumar V, Chubb D, Turner G (1996) Parallelizing a gis on a shared address space architecture. Computer 29(12):42–48

    Article  Google Scholar 

  • Shekhar S, Ravada S, Chubb D, Turner G (1998) Declustering and load-balancing methods for parallelizing geographic information systems. IEEE Trans Knowl Data Eng 10(4):632–655

    Article  Google Scholar 

  • Shekhar S, Schrater P, Vatsavai R, Wu W, Chawla S (2002) Spatial contextual classification and prediction models for mining geospatial data. IEEE Trans Multimed 4(2):174–188. IEEE Computer Society

    Article  Google Scholar 

  • Shekhar S, Evans M, Kang J, Mohan P (2011) Identifying patterns in spatial information: a survey of methods. Wiley Interdisc Rev Data Min Knowl Discov 1(3):193–214

    Article  Google Scholar 

  • Shekhar S, Gunturi V, Evans MR, Yang K (2012) Spatial big-data challenges intersecting mobility and cloud computing. In: Proceedings of the eleventh ACM international workshop on data engineering for wireless and mobile access. ACM, pp 1–6

    Google Scholar 

  • Shekhar S, Evans MR, Gunturi V, Yang K, Cugler DC (2014) Benchmarking spatial big data. In: Specifying big data benchmarks. Springer, pp 81–93

    Chapter  Google Scholar 

  • Shi X, Ye F (2013) Kriging interpolation over heterogeneous computer architectures and systems. GISci Remote Sens 50(2):196–211

    Google Scholar 

  • Soble J (2016) Mitsubishi admits cheating on fuel-economy tests. Available via New York Times. https://goo.gl/zkKBpn. Accessed 1 Mar 2017

  • Sperling D, Gordon D (2009) Two billion cars. Oxford University Press

    Google Scholar 

  • TeleNav (2014) http://www.telenav.com/. Accessed 1 Mar 2017

  • TeloGIS (2017) http://www.telogis.com/. Accessed 1 Mar 2017

  • The Millennium Project (2014) Global challenges for humanity. http://www.millennium-project.org/millennium/challenges.html. Accessed 7 Feb 2014

  • TomTom (2011) TomTom GPS navigation. http://www.tomtom.com/. Accessed 1 Mar 2017

  • US Congress (2007) Energy independence and security act of 2007. Public Law (110–140). https://www.gpo.gov/fdsys/pkg/PLAW-110publ140/html/PLAW-110publ140.htm. Accessed 1 Mar 2017

  • US Energy Information Adminstration (2011) Monthly energy review June 2011. http://www.eia.gov/totalenergy/data/monthly/. Accessed 24 Feb 2011

  • US Environmental Protection Agency (n.d.) Sources of greenhouse gas emissions. https://www.epa.gov/ghgemissions/sources-greenhouse-gas-emissions. Accessed 1 Mar 2017

  • Vatsavai RR, Ganguly A, Chandola V, Stefanidis A, Klasky S, Shekhar S (2012) Spatiotemporal data mining in the era of big spatial data: algorithms and applications. In: Proceedings of the 1st ACM SIGSPATIAL international workshop on analytics for big geospatial data. ACM, pp 1–10

    Google Scholar 

  • Wang S (2010) A cybergis framework for the synthesis of cyberinfrastructure, GIS, and spatial analysis. Ann Assoc Am Geogr 100(3):535–557

    Article  Google Scholar 

  • Waze Mobile (2006) http://www.waze.com/. Accessed 1 Mar 2017

  • White J (2014) U.S. fines hyundai, kia for fuel claims. Available via The Wall Journal. https://goo.gl/7C0ZMj. Accessed 1 Mar 2017

  • Wikipedia (2011) Usage-based insurance—wikipedia, the free encyclopedia. http://goo.gl/NqJE5. Accessed 15 Dec 2011

  • Willford JN (2010) Mapping ancient civilization, in a matter of days. Available via New York Times. http://www.nytimes.com/2010/05/11/science/11maya.html. Accessed 1 Mar 2017

  • Yu J, Wu J, Sarwat M (2015) Geospark: a cluster computing framework for processing large-scale spatial data. In: Proceedings of the 23rd SIGSPATIAL international conference on advances in geographic information systems. ACM, p 70

    Google Scholar 

  • Zhang Y, Gao Q, Gao L, Wang C (2011) Priter: a distributed framework for prioritized iterative computations. In: Proceedings of the 2nd ACM symposium on cloud computing. ACM, p 13

    Google Scholar 

  • Zhou X, Shekhar S, Mohan P, Liess S, Snyder PK (2011) Discovering interesting sub-paths in spatiotemporal datasets: a summary of results. In: Proceedings of the 19th ACM SIGSPATIAL international conference on advances in geographic information systems. ACM, pp 44–53

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Reem Y. Ali .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Science+Business Media B.V., part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Evans, M.R., Oliver, D., Yang, K., Zhou, X., Ali, R.Y., Shekhar, S. (2019). Enabling Spatial Big Data via CyberGIS: Challenges and Opportunities. In: Wang, S., Goodchild, M. (eds) CyberGIS for Geospatial Discovery and Innovation. GeoJournal Library, vol 118. Springer, Dordrecht. https://doi.org/10.1007/978-94-024-1531-5_8

Download citation

Publish with us

Policies and ethics