Abstract
Spatial data science is a multi-disciplinary field that applies scientific methods to acquire, store, and manage spatial data, as well as to retrieve previously unknown, but potentially useful and non-trivial knowledge and insights from the data. Spatial data science is important for societal applications in public health, public safety, agriculture, environmental science, climate, etc. The challenges of spatial data science are brought about by its interdisciplinary nature and the unique properties of spatial data, such as spatial autocorrelation and spatial heterogeneity. In this section, we discuss spatial data science in its life cycle: data acquisition, data storage, data mining, result validation, and domain interpretation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
GIS At DOT (2017). https://www.transportation.gov/gis
Aach, T., Kaup, A., Mester, R.: Statistical model-based change detection in moving video. Signal processing 31(2), 165–180 (1993)
Aggarwal, C.C.: Outlier analysis. In: Data mining, pp. 237–263. Springer (2015)
Agrawal, R., Srikant, R., others: Fast algorithms for mining association rules. In: Proc. 20th int. conf. very large data bases, VLDB, vol. 1215, pp. 487–499 (1994)
Aji, A., Wang, F., Vo, H., Lee, R., Liu, Q., Zhang, X., Saltz, J.: Hadoop GIS: a high performance spatial data warehousing system over MapReduce. Proceedings of the VLDB Endowment 6(11), 1009–1020 (2013)
Anselin, L.: Local indicators of spatial association—LISA. Geographical analysis 27(2), 93–115 (1995)
Anselin, L.: The Moran scatterplot as an ESDA tool to assess local instability in spatial association. In: Spatial Analytical, pp. 111–126. Routledge (2019)
Atluri, G., Karpatne, A., Kumar, V.: Spatio-Temporal Data Mining: A Survey of Problems and Methods. ACM Comput. Surv. 51(4), 83:1–83:41 (2018). https://doi.org/10.1145/3161602
Barua, S., Sander, J.: Mining statistically significant co-location and segregation patterns. IEEE Transactions on Knowledge and Data Engineering 26(5), 1185–1199 (2013)
Beckmann, N., Kriegel, H.P., Schneider, R., Seeger, B.: The r*-tree: an efficient and robust access method for points and rectangles. In: ACM SIGMOD Record, vol. 19, pp. 322–331. ACM (1990)
Brunsdon, C., Fotheringham, S., Charlton, M.: Geographically weighted regression. Journal of the Royal Statistical Society: Series D (The Statistician) 47(3), 431–443 (1998)
Cai, J., Liu, Q., Deng, M., Tang, J., He, Z.: Adaptive detection of statistically significant regional spatial co-location patterns. Computers, Environment and Urban Systems 68, 53–63 (2018). https://doi.org/10.1016/j.compenvurbsys.2017.10.003
Caldwell, P.M., Bretherton, C.S., Zelinka, M.D., Klein, S.A., Santer, B.D., Sanderson, B.M.: Statistical significance of climate sensitivity predictors obtained by data mining. Geophysical Research Letters 41(5), 1803–1808 (2014). https://doi.org/10.1002/2014GL059205
Campbell, J.B., Wynne, R.H.: Introduction to Remote Sensing, Fifth Edition, 5th edition edn. The Guilford Press, New York (2011)
Celik, M., Kang, J.M., Shekhar, S.: Zonal co-location pattern discovery with dynamic parameters. In: Data Mining, 2007. ICDM 2007. Seventh IEEE International Conference on, pp. 433–438. IEEE (2007)
Cheng, Z., Caverlee, J., Lee, K.: You Are Where You Tweet: A Content-based Approach to Geo-locating Twitter Users. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, CIKM ’10, pp. 759–768. ACM, New York, NY, USA (2010). https://doi.org/10.1145/1871437.1871535. Event-place: Toronto, ON, Canada
Costa, M.A., Assunção, R.M., Kulldorff, M.: Constrained spanning tree algorithms for irregularly-shaped spatial clustering. Computational Statistics & Data Analysis 56(6), 1771–1783 (2012). https://doi.org/10.1016/j.csda.2011.11.001
Cressie, N.: Statistics for Spatial Data. John Wiley & Sons (2015)
Daley, D.: GOP Racial Gerrymandering Mastermind Participated in Redistricting in More States Than Previously Known, Files Reveal (2019). https://theintercept.com/2019/09/23/gerrymandering-gop-west-virginia-florida-alabama/
Deng, M., Cai, J., Liu, Q., He, Z., Tang, J.: Multi-level method for discovery of regional co-location patterns. International Journal of Geographical Information Science 31(9), 1846–1870 (2017). https://doi.org/10.1080/13658816.2017.1334890
Dixon, P.M.: Ripley’s K Function. In: Encyclopedia of Environmetrics. John Wiley & Sons, Ltd (2006). https://doi.org/10.1002/9780470057339.var046
Eck, J., Chainey, S., Cameron, J., Wilson, R.: Mapping crime: Understanding hotspots (2005). http://discovery.ucl.ac.uk/11291/1/11291.pdf
Eftelioglu, E., Li, Y., Tang, X., Shekhar, S., Kang, J.M., Farah, C.: Mining Network Hotspots with Holes: A Summary of Results. In: Geographic Information Science, Lecture Notes in Computer Science, pp. 51–67. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45738-3_4
Eftelioglu, E., Shekhar, S., Kang, J.M., Farah, C.C.: Ring-Shaped Hotspot Detection. IEEE Transactions on Knowledge and Data Engineering 28(12), 3367–3381 (2016). https://doi.org/10.1109/TKDE.2016.2607202
Eick, C.F., Parmar, R., Ding, W., Stepinski, T.F., Nicot, J.P.: Finding Regional Co-location Patterns for Sets of Continuous Variables in Spatial Datasets. In: Proceedings of the 16th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, GIS ’08, pp. 30:1–30:10. ACM, New York, NY, USA (2008). https://doi.org/10.1145/1463434.1463472
Eldawy, A., Mokbel, M.F.: SpatialHadoop: A MapReduce framework for spatial data. In: 2015 IEEE 31st International Conference on Data Engineering, pp. 1352–1363 (2015). https://doi.org/10.1109/ICDE.2015.7113382
Elmasri, R., Navathe, S.B.: Fundamentals of Database Systems, 7 edn. Pearson, Hoboken, NJ (2015)
ESRI: GIS Tools for Hadoop by Esri. http://esri.github.io/gis-tools-for-hadoop/
Ester, M., Kriegel, H.P., Sander, J., Xu, X., others: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Kdd, vol. 96, pp. 226–231 (1996)
Finkel, R.A., Bentley, J.L.: Quad trees a data structure for retrieval on composite keys. Acta informatica 4(1), 1–9 (1974)
Gelfand, A.E., Diggle, P., Guttorp, P., Fuentes, M.: Handbook of Spatial Statistics. CRC Press (2010)
Guha, S., Rastogi, R., Shim, K.: Rock: A robust clustering algorithm for categorical attributes. Information systems 25(5), 345–366 (2000)
Guttman, A.: R-trees: A dynamic index structure for spatial searching, vol. 14. ACM (1984)
Hilbert, D.: Über die stetige abbildung einer linie auf ein flächenstück. In: Dritter Band: Analysis· Grundlagen der Mathematik· Physik Verschiedenes, pp. 1–2. Springer (1935)
Huang, Y., Pei, J., Xiong, H.: Mining Co-Location Patterns with Rare Events from Spatial Data Sets. GeoInformatica 10(3), 239–260 (2006). https://doi.org/10.1007/s10707-006-9827-8
Huang, Y., Shekhar, S., Xiong, H.: Discovering colocation patterns from spatial data sets: a general approach. IEEE Transactions on Knowledge and Data Engineering 16(12), 1472–1485 (2004)
Huang, Y., Xiong, H., Shekhar, S., Pei, J.: Mining Confident Co-location Rules Without a Support Threshold. In: Proceedings of the 2003 ACM Symposium on Applied Computing, SAC ’03, pp. 497–501. ACM, New York, NY, USA (2003). https://doi.org/10.1145/952532.952630
Im, J., Jensen, J., Tullis, J.: Object-based change detection using correlation image analysis and image segmentation. International Journal of Remote Sensing 29(2), 399–423 (2008)
International Federation of Surveyors: FIG Definition of the Functions of the Surveyor (2004). http://www.fig.net/about/general/definition/index.asp
Jia, X., Willard, J., Karpatne, A., Read, J., Zwart, J., Steinbach, M., Kumar, V.: Physics guided RNNs for modeling dynamical systems: A case study in simulating lake temperature profiles. In: Proceedings of the 2019 SIAM International Conference on Data Mining, pp. 558–566. SIAM (2019)
Jiang, Z., Sainju, A.M., Li, Y., Shekhar, S., Knight, J.: Spatial ensemble learning for heterogeneous geographic data with class ambiguity. ACM Transactions on Intelligent Systems and Technology (TIST) 10(4), 43 (2019)
Jiang, Z., Shekhar, S., Zhou, X., Knight, J., Corcoran, J.: Focal-Test-Based Spatial Decision Tree Learning. IEEE Transactions on Knowledge and Data Engineering 27(6), 1547–1559 (2015). https://doi.org/10.1109/TKDE.2014.2373383
Joshi, N., Baumann, M., Ehammer, A., Fensholt, R., Grogan, K., Hostert, P., Jepsen, M.R., Kuemmerle, T., Meyfroidt, P., Mitchard, E.T.A., Reiche, J., Ryan, C.M., Waske, B.: A Review of the Application of Optical and Radar Remote Sensing Data Fusion to Land Use Mapping and Monitoring. Remote Sensing 8(1), 70 (2016). https://doi.org/10.3390/rs8010070
Kaufman, L., Rousseeuw, P.J.: Finding groups in data: an introduction to cluster analysis, vol. 344. John Wiley & Sons (2009)
Kulldorff, M.: A spatial scan statistic. Communications in Statistics—Theory and Methods 26(6), 1481–1496 (1997). https://doi.org/10.1080/03610929708831995
Lens, M.C., Meltzer, R.: Is Crime Bad for Business? Crime and Commercial Property Values in New York City. Journal of Regional Science 56(3), 442–470 (2016). https://doi.org/10.1111/jors.12254
Li, W., Du, Q.: A survey on representation-based classification and detection in hyperspectral remote sensing imagery. Pattern Recognition Letters 83, 115–123 (2016)
Li, Y., Kotwal, P., Wang, P., Shekhar, S., Northrop, W.: Trajectory-aware Lowest-cost Path Selection: A Summary of Results. In: Proceedings of the 16th International Symposium on Spatial and Temporal Databases, SSTD ’19, pp. 61–69. ACM, Vienna, Austria (2019). https://doi.org/10.1145/3340964.3340971
Li, Y., Shekhar, S.: Local Co-location Pattern Detection: A Summary of Results. In: S. Winter, A. Griffin, M. Sester (eds.) 10th International Conference on Geographic Information Science (GIScience 2018), Leibniz International Proceedings in Informatics (LIPIcs), vol. 114, pp. 10:1–10:15. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany (2018). https://doi.org/10.4230/LIPIcs.GISCIENCE.2018.10
Li, Y., Shekhar, S., Wang, P., Northrop, W.: Physics-guided Energy-efficient Path Selection: A Summary of Results. In: Proceedings of the 26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, SIGSPATIAL ’18, pp. 99–108. ACM, Seattle, WA, USA (2018). https://doi.org/10.1145/3274895.3274933
Lin, Y., Chiang, Y.Y., Franklin, M., Eckel, S.P., Ambite, J.L.: Building autocorrelation-aware representations for fine-scale spatiotemporal prediction. In: 2020 IEEE International Conference on Data Mining (ICDM), pp. 352–361. IEEE (2020)
Mac Aodha, O., Cole, E., Perona, P.: Presence-only geographical priors for fine-grained image classification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9596–9606 (2019)
Marcus, G., Davis, E.: Eight (No, Nine!) Problems With Big Data. The New York Times (2014). http://www.nytimes.com/2014/04/07/opinion/eight-no-nine-problems-with-big-data.html
Mohan, P., Shekhar, S., Shine, J.A., Rogers, J.P., Jiang, Z., Wayant, N.: A Neighborhood Graph Based Approach to Regional Co-location Pattern Discovery: A Summary of Results. In: Proceedings of the 19th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, GIS ’11, pp. 122–132. ACM, New York, NY, USA (2011). https://doi.org/10.1145/2093973.2093991
Morton, G.M.: A computer oriented geodetic data base and a new technique in file sequencing (1966)
National Cancer Institute: GIS at the National Cancer Institute. https://gis.cancer.gov/gis-nci/gis_nci.html
National Geospatial-Intelligence Agency: About NGA. https://www.nga.mil/About/Pages/Default.aspx
Neill, D.B.: Expectation-based scan statistics for monitoring spatial time series data. International Journal of Forecasting 25(3), 498–517 (2009)
Neill, D.B., Moore, A.W.: Rapid Detection of Significant Spatial Clusters. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’04, pp. 256–265. ACM, New York, NY, USA (2004). https://doi.org/10.1145/1014052.1014082
Open Geospatial Consortium: OpenGIS Implementation Specification for Geographic information—Simple feature access—Part 2: SQL option. http://portal.opengeospatial.org/files/?artifact_id=25354
Open Geospatial Consortium: OGC Standards and Supporting Documents (2019). http://www.opengeospatial.org/standards/
Ploner, A.: The use of the variogram cloud in geostatistical modelling. Environmetrics: The official journal of the International Environmetrics Society 10(4), 413–437 (1999)
Qian, F., Chiew, K., He, Q., Huang, H.: Mining regional co-location patterns with kNNG. Journal of Intelligent Information Systems 42(3), 485–505 (2014). https://doi.org/10.1007/s10844-013-0280-5
Qian, F., He, Q., He, J.: Mining Spatial Co-location Patterns with Dynamic Neighborhood Constraint. In: Machine Learning and Knowledge Discovery in Databases, Lecture Notes in Computer Science, pp. 238–253. Springer, Berlin, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04174-7_16
Sellis, T., Roussopoulos, N., Faloutsos, C.: The r+-tree: A dynamic index for multi-dimensional objects. Tech. rep. (1987)
Shekhar, S., Chawla, S.: Spatial Databases: A Tour, 1 edition edn. Prentice Hall, Upper Saddle River, N.J (2003)
Shekhar, S., Evans, M.R., Kang, J.M., Mohan, P.: Identifying patterns in spatial information: A survey of methods. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 1(3), 193–214 (2011). https://doi.org/10.1002/widm.25
Shekhar, S., Jiang, Z., Ali, R., Eftelioglu, E., Tang, X., Gunturi, V., Zhou, X.: Spatiotemporal data mining: A computational perspective. ISPRS International Journal of Geo-Information 4(4), 2306–2338 (2015)
Shekhar, S., Schrater, P.R., Vatsavai, R.R., Wu, W., Chawla, S.: Spatial contextual classification and prediction models for mining geospatial data. IEEE Transactions on Multimedia 4(2), 174–188 (2002)
Shi, L., Janeja, V.P.: Anomalous Window Discovery for Linear Intersecting Paths. IEEE Transactions on Knowledge and Data Engineering 23(12), 1857–1871 (2011). https://doi.org/10.1109/TKDE.2010.212
Shvachko, K., Kuang, H., Radia, S., Chansler, R., et al.: The Hadoop distributed file system. In: MSST, vol. 10, pp. 1–10 (2010)
Sinha, S., Jeganathan, C., Sharma, L.K., Nathawat, M.S.: A review of radar remote sensing for biomass estimation. International Journal of Environmental Science and Technology 12(5), 1779–1792 (2015). https://doi.org/10.1007/s13762-015-0750-0
Srinivasan, S.: Spatial Regression Models. In: S. Shekhar, H. Xiong, X. Zhou (eds.) Encyclopedia of GIS, pp. 1–6. Springer International Publishing, Cham (2015). https://doi.org/10.1007/978-3-319-23519-6_1294-2
Stewart, A.J., Mosleh, M., Diakonova, M., Arechar, A.A., Rand, D.G., Plotkin, J.B.: Information gerrymandering and undemocratic decisions. Nature 573(7772), 117–121 (2019). https://doi.org/10.1038/s41586-019-1507-6
Tang, X., Eftelioglu, E., Oliver, D., Shekhar, S.: Significant Linear Hotspot Discovery. IEEE Transactions on Big Data 3(2), 140–153 (2017). https://doi.org/10.1109/TBDATA.2016.2631518
Tang, X., Eftelioglu, E., Shekhar, S.: Elliptical Hotspot Detection: A Summary of Results. In: Proceedings of the 4th International ACM SIGSPATIAL Workshop on Analytics for Big Geospatial Data, BigSpatial’15, pp. 15–24. ACM, New York, NY, USA (2015). https://doi.org/10.1145/2835185.2835192
Tang, X., Eftelioglu, E., Shekhar, S.: Detecting Isodistance Hotspots on Spatial Networks: A Summary of Results. In: M. Gertz, M. Renz, X. Zhou, E. Hoel, W.S. Ku, A. Voisard, C. Zhang, H. Chen, L. Tang, Y. Huang, C.T. Lu, S. Ravada (eds.) Advances in Spatial and Temporal Databases, Lecture Notes in Computer Science, pp. 281–299. Springer International Publishing (2017)
Tobler, W.R.: A Computer Movie Simulating Urban Growth in the Detroit Region. Economic Geography 46(sup1), 234–240 (1970). https://doi.org/10.2307/143141
Toth, C., Jóźków, G.: Remote sensing platforms and sensors: A survey. ISPRS Journal of Photogrammetry and Remote Sensing 115, 22–36 (2016). https://doi.org/10.1016/j.isprsjprs.2015.10.004
Walsh, B.: Google’s Flu Project Shows the Failings of Big Data. https://time.com/23782/google-flu-trends-big-data-problems/
Wang, S., Huang, Y., Wang, X.S.: Regional Co-locations of Arbitrary Shapes. In: Advances in Spatial and Temporal Databases, pp. 19–37. Springer Berlin Heidelberg (2013). https://doi.org/10.1007/978-3-642-40235-7_2
Wong, C., Sorensen, P., Hollywood, J.S.: Evaluation of National Institute of Justice-Funded Geospatial Software Tools (2014). https://www.rand.org/pubs/research_reports/RR418.html
Wu, B., Yu, B., Wu, Q., Yao, S., Zhao, F., Mao, W., Wu, J.: A Graph-Based Approach for 3D Building Model Reconstruction from Airborne LiDAR Point Clouds. Remote Sensing 9(1), 92 (2017). https://doi.org/10.3390/rs9010092
Xie, Y., Eftelioglu, E., Ali, R.Y., Tang, X., Li, Y., Doshi, R., Shekhar, S.: Transdisciplinary Foundations of Geospatial Data Science. ISPRS International Journal of Geo-Information 6(12), 395 (2017)
Xie, Y., Gupta, J., Li, Y., Shekhar, S.: Transforming smart cities with spatial computing. In: 2018 IEEE International Smart Cities Conference (ISC2), pp. 1–9. IEEE (2018)
Xie, Y., Shekhar, S.: A Nondeterministic Normalization based Scan Statistic (NN-scan) towards Robust Hotspot Detection: A Summary of Results. In: Proceedings of the 2019 SIAM International Conference on Data Mining, Proceedings, pp. 82–90. Society for Industrial and Applied Mathematics (2019). https://doi.org/10.1137/1.9781611975673.10
Xie, Y., Shekhar, S.: Significant DBSCAN Towards Statistically Robust Clustering. In: Proceedings of the 16th International Symposium on Spatial and Temporal Databases, SSTD ’19, pp. 31–40. ACM, New York, NY, USA (2019). https://doi.org/10.1145/3340964.3340968. Event-place: Vienna, Austria
Xie, Y., Zhou, X., Shekhar, S.: Discovering interesting sub-paths with statistical significance from spatio-temporal datasets. ACM Transactions on Intelligent Systems and Technology (2019)
Xiong, H., Shekhar, S., Huang, Y., Kumar, V., Ma, X., Yoc, J.: A Framework for Discovering Co-location Patterns in Data Sets with Extended Spatial Objects. In: Proceedings of the 2004 SIAM International Conference on Data Mining, Proceedings, pp. 78–89. Society for Industrial and Applied Mathematics (2004)
Yan, H.S., Ceccarelli, M. (eds.): International Symposium on History of Machines and Mechanisms: Proceedings of HMM 2008. History of Mechanism and Machine Science. Springer Netherlands (2009)
Yan, Z., Chakraborty, D., Parent, C., Spaccapietra, S., Aberer, K.: Semantic Trajectories: Mobility Data Computation and Annotation. ACM Trans. Intell. Syst. Technol. 4(3), 49:1–49:38 (2013). https://doi.org/10.1145/2483669.2483682
Yao, X., Mokbel, M.F., Alarabi, L., Eldawy, A., Yang, J., Yun, W., Li, L., Ye, S., Zhu, D.: Spatial coding-based approach for partitioning big spatial data in Hadoop. Computers & Geosciences 106, 60–67 (2017). https://doi.org/10.1016/j.cageo.2017.05.014
Yoo, J.S., Shekhar, S.: A Joinless Approach for Mining Spatial Colocation Patterns. IEEE Transactions on Knowledge and Data Engineering 18(10), 1323–1337 (2006). https://doi.org/10.1109/TKDE.2006.150
Yu, J., Wu, J., Sarwat, M.: GeoSpark: A Cluster Computing Framework for Processing Large-scale Spatial Data. In: Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems, SIGSPATIAL ’15, pp. 70:1–70:4. ACM, New York, NY, USA (2015). https://doi.org/10.1145/2820783.2820860
Zheng, Y.: Trajectory Data Mining: An Overview. ACM Trans. Intell. Syst. Technol. 6(3), 29:1–29:41 (2015). https://doi.org/10.1145/2743025
Zhou, X., Shekhar, S., Ali, R.Y.: Spatiotemporal change footprint pattern discovery: an inter-disciplinary survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4(1), 1–23 (2014). https://doi.org/10.1002/widm.1113
Zhou, X., Shekhar, S., Mohan, P., Liess, S., Snyder, P.K.: Discovering interesting sub-paths in spatiotemporal datasets: A summary of results. In: Proceedings of the 19th ACM SIGSPATIAL international conference on advances in geographic information systems, pp. 44–53. ACM (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Li, Y., Xie, Y., Shekhar, S. (2023). Spatial Data Science. In: Rokach, L., Maimon, O., Shmueli, E. (eds) Machine Learning for Data Science Handbook. Springer, Cham. https://doi.org/10.1007/978-3-031-24628-9_18
Download citation
DOI: https://doi.org/10.1007/978-3-031-24628-9_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-24627-2
Online ISBN: 978-3-031-24628-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)