Skip to main content
Log in

Big Spatial Data Management for the Internet of Things: A Survey

  • Published:
Journal of Network and Systems Management Aims and scope Submit manuscript

Abstract

The high abundance of IoT devices have caused an unprecedented accumulation of avalanches of geo-referenced IoT spatial data that if could be analyzed correctly would unleash important information. This can feed decision support systems for better decision making and strategic planning regarding important aspects of our lives that depend heavily on location-based services. Several spatial data management systems for IoT data in Cloud has recently gained momentum. However, the literature is still missing a comprehensive survey that conceptualize a convenient framework that classify those frameworks under appropriate categories. In this survey paper, we focus on the management of big geospatial data that are generated by IoT data sources. We also define a conceptual framework and match the works of the recent literature with it. We then identify future research frontiers in the field depending on the surveyed works.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. http://geohash.org/.

Abbreviations

BSO:

Boundary spatial objects

BSP:

Binary space partition

DB:

Database

DBSCAN-MR:

Density-based spatial clustering of applications with noise—MapReduce

GI:

Global indexing

GIS:

Geographic information System

IoT:

Internet of Things

JSON:

JavaScript object notation

kNN:

k nearest neighborhood

MBR:

Minimum bounding rectangle

MLI:

Multi-level index

NoSQL:

Not-only SQL

ODSI:

On demand spatial indexing

OLI:

One-layer index

QoS:

Quality of service

RDD:

Resilient distributed datasets

SAQP:

Spatial approximate query processing

SCI:

Spatial coding index

SDL:

Spatial data locality

SDME:

Spatial data management engine

SDMS:

Spatial data management system

SFC:

Space-filling curves

SLA:

Service level agreement

SLR:

Systematic literature review

SPE:

Spatial processing engine

SRDD:

Spatial RDD

STR:

Sort-tile recurse

References

  1. Al Jawarneh, I.M., Bellavista, P., Foschini, L., Montanari, R.: Spatial-aware approximate big data stream processing. In: 2019 IEEE global communications conference (GLOBECOM), pp. 1–6 (2019)

  2. Aljawarneh, I.M., Bellavista, P., De Rolt, C. R., Foschini, L.: Dynamic identification of participatory mobile health communities. In: Cloud infrastructures, services, and IoT systems for smart cities, pp. 208–217. Anonymous Springer (2017)

  3. Sahoo, S.S., Wei, A., Tatsuoka, C., Ghosh, K., Lhatoo, S.D.: Processing neurology clinical data for knowledge discovery: scalable data flows using distributed computing. In: Machine Learning for Health Informatics, pp. 303–318. Anonymous Springer (2016)

  4. Aji, A., Wang, F., Saltz, J.H.: Towards building a high performance spatial query system for large scale medical imaging data. In: Proceedings of the 20th international conference on advances in geographic information systems, pp. 309–318 (2012)

  5. Gomes, E., Dantas, M.A., de Macedo, D.D., De Rolt, C., Brocardo, M.L., Foschini, L.: Towards an infrastructure to support big data for a smart city project. In: 2016 IEEE 25th international conference on enabling technologies: infrastructure for collaborative enterprises (WETICE), pp. 107–112 (2016)

  6. Bellavista, P., Berrocal, J., Corradi, A., Das, S.K., Foschini, L., Al Jawarneh, I.M., Zanni, A.: How fog computing can support latency/reliability-sensitive IoT applications: an overview and a taxonomy of state-of-the-art solutions (2019)

  7. Vatsavai, R.R., Ganguly, A., Chandola, V., Stefanidis, A., Klasky, S., Shekhar, S.: Spatiotemporal data mining in the era of big spatial data: algorithms and applications. In: Proceedings of the 1st ACM SIGSPATIAL international workshop on analytics for big geospatial data, pp. 1–10 (2012)

  8. Botta, A., De Donato, W., Persico, V., Pescapé, A.: Integration of cloud computing and internet of things: a survey. Future Gener. Comput. Syst 56, 684–700 (2016)

    Article  Google Scholar 

  9. Bellavista, P., Berrocal, J., Corradi, A., Das, S.K., Foschini, L., Zanni, A.: A survey on fog computing for the Internet of Things. Pervasive Mob. Comput. 52, 71–99 (2019)

    Article  Google Scholar 

  10. Jones, K.E., Patel, N.G., Levy, M.A., Storeygard, A., Balk, D., Gittleman, J.L., Daszak, P.: Global trends in emerging infectious diseases. Nature 451(7181), 990–993 (2008)

    Article  Google Scholar 

  11. Bellavista, P., Berrocal, J., Corradi, A., Das, S.K., Foschini, L., Zanni, A.: A survey on fog computing for the Internet of Things. Pervasive Mob. Comput. 52, 71–99 (2018)

    Article  Google Scholar 

  12. Ge, M., Bangui, H., Buhnova, B.: Big data for internet of things: a survey. Future Gener. Comput. Syst. 87, 601–614 (2018)

    Article  Google Scholar 

  13. Siow, E., Tiropanis, T., Hall, W.: Analytics for the internet of things: a survey. ACM Comput. Surv. 51(4), 1–36 (2018)

    Article  Google Scholar 

  14. Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. HotCloud 10(10-10), 95 (2010)

    Google Scholar 

  15. Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The hadoop distributed file system. In: Msst, pp. 1–10 (2010)

  16. Bradshaw, S., Chodorow, K.: Mongodb: the definitive guide: powerful and scalable data storage, 3rd edn. O’Reilly Media Inc, Newton (2018)

    Google Scholar 

  17. Banker, K.: MongoDB in action. Manning Publications Co., Shelter Island (2011)

    Google Scholar 

  18. Yu, J., Zhang, Z., Sarwat, M.: Spatial data management in apache spark: the geospark perspective and beyond. GeoInformatica 23(1), 37–78 (2019)

    Article  Google Scholar 

  19. Khan, R., Khan, S.U., Zaheer, R., Khan, S.: Future internet: the internet of things architecture, possible applications and key challenges. In: 2012 10th international conference on frontiers of information technology, pp. 257–260 (2012)

  20. Tsichritzis, D.C., Lochovsky, F.H.: Hierarchical data-base management: a survey. ACM Comput. Surv. 8(1), 105–123 (1976)

    Article  MATH  Google Scholar 

  21. DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., Vogels, W.: Dynamo: amazon’s highly available key-value store. ACM SIGOPS Oper. Syst. Rev. 41(6), 205–220 (2007)

    Article  Google Scholar 

  22. Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. ACM SIGOPS Oper. Syst. Rev. 44(2), 35–40 (2010)

    Article  Google Scholar 

  23. Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: a distributed storage system for structured data. ACM Trans. Comput. Syst. 26(2), 1–26 (2008)

    Article  Google Scholar 

  24. Team, A.H.: Apache hbase reference guide. Apache, Version, vol. 2, (0) (2016)

  25. Grolinger, K., Higashino, W.A., Tiwari, A., Capretz, M.A.: Data management in cloud environments: NoSQL and NewSQL data stores. J. Cloud Comput. Adv. Syst. Appl. 2(1), 22 (2013)

    Article  Google Scholar 

  26. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  27. Jennings, B., Stadler, R.: Resource management in clouds: survey and research challenges. J. Netw. Syst. Management 23(3), 567–619 (2015)

    Article  Google Scholar 

  28. Al Jawarneh, I.M., Bellavista, P., Casimiro, F., Corradi, A, Foschini, L.: Cost-effective strategies for provisioning NoSQL storage services in support for industry 4.0. In: 2018 IEEE symposium on computers and communications (ISCC), pp. 1227 (2018)

  29. Aji, A., Wang, F., Vo, H., Lee, R., Liu, Q., Zhang, X., Saltz, J.: Hadoop gis: a high performance spatial data warehousing system over mapreduce. Proc. VLDB Endowment 6(11), 1009–1020 (2013)

    Article  Google Scholar 

  30. Eldawy, A., Mokbel, M.F.: Spatialhadoop: a mapreduce framework for spatial data. In: 2015 IEEE 31st international conference on data engineering, pp. 1352–1363 (2015)

  31. You, S., Zhang, J., Gruenwald, L.: Large-scale spatial join query processing in cloud. In: 2015 31st IEEE international conference on data engineering workshops, pp. 34–41 (2015)

  32. Nishimura, S., Das, S., Agrawal, D., El Abbadi, A.: Md-hbase: a scalable multi-dimensional data infrastructure for location aware services. In: in 2011 IEEE 12th international conference on mobile data management, pp. 7–16 (2011)

  33. Yu, J., Wu, J., Sarwat, M.: Geospark: a cluster computing framework for processing large-scale spatial data. In: Proceedings of the 23rd SIGSPATIAL international conference on advances in geographic information systems, pp. 70 (2015)

  34. Tang, M., Yu, Y., Aref, W.G., Mahmood, A.R., Malluhi, Q.M., Ouzzani, M.: Locationspark: in-memory distributed spatial query processing and optimization. In: CoRR, pp. 1–15 (2019)

  35. Eldawy, A., Mokbel, M.F., Alharthi, S., Alzaidy, A., Tarek, K., Ghani, S.: Shahed: a mapreduce-based system for querying and visualizing spatio-temporal satellite data. In: 2015 IEEE 31st international conference on data engineering, pp. 1585–1596 (2015)

  36. Vo, H., Aji, A., Wang, F.: SATO: a spatial data partitioning framework for scalable query processing. In: Proceedings of the 22nd ACM SIGSPATIAL international conference on advances in geographic information systems, pp. 545–548 (2014)

  37. Bentley, J.L., Friedman, J.H.: Data structures for range searching. ACM Comput. Surv. 11(4), 397–409 (1979)

    Article  Google Scholar 

  38. Knuth, D.E.: The art of computer programming: sorting and searching, vol. 3, 2nd edn. Addison-Wesley Publishing Company, Redwood City (1998)

    MATH  Google Scholar 

  39. Finkel, R.A., Bentley, J.L.: Quad trees a data structure for retrieval on composite keys. Acta Informatica 4(1), 1–9 (1974)

    Article  MATH  Google Scholar 

  40. Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)

    Article  MathSciNet  MATH  Google Scholar 

  41. Sellis, T.K., Roussopoulos, N., Faloutsos, C.: The R -tree: a dynamic index for multi-dimensional objects. In: Proceedings of the 13th international conference on very large data bases, pp. 507–518 (1987)

  42. Sagan, H.: Space-filling curves. Springer-Verlag, Berlin (1994)

    Book  MATH  Google Scholar 

  43. Fuchs, H., Kedem, Z.M., Naylor, B.F.: On visible surface generation by a priori tree structures. In: ACM Siggraph computer graphics, pp. 124–133 (1980)

  44. Leutenegger, S.T., Lopez, M.A., Edgington, J.: STR: a simple and efficient algorithm for R-tree packing. In: Proceedings 13th international conference on data engineering, pp. 497–506 (1997)

  45. Asano, T., Ranjan, D., Roos, T., Welzl, E., Widmayer, P.: Space-filling curves and their use in the design of geometric data structures. Theor. Comput. Sci. 181(1), 3–15 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  46. Aljawarneh, I.M., Bellavista, P., Corradi, A., Montanari, R., Foschini, L., Zanotti, A.: Efficient spark-based framework for big geospatial data query processing and analysis. In: 2017 IEEE symposium on computers and communications (ISCC), pp. 851–856 (2017)

  47. Al Jawarneh, I.M., Bellavista, P., Corradi, A., Foschini, L., Montanari, R., Zanotti, A.: In-memory spatial-aware framework for processing proximity-alike queries in big spatial data. In: 2018 IEEE 23rd international workshop on computer aided modeling and design of communication links and networks (CAMAD), pp. 1–6 (2018)

  48. Aly, A.M., Mahmood, A.R., Hassan, M.S., Aref, W.G., Ouzzani, M., Elmeleegy, H., Qadah, T.: AQWA: adaptive query workload aware partitioning of big spatial data. Proc. VLDB Endowment 8(13), 2062–2073 (2015)

    Article  Google Scholar 

  49. Abdelhamid, A.S., Tang, M., Aly, A.M., Mahmood, A.R., Qadah, T., Aref, W.G., Basalamah, S.: Cruncher: distributed in-memory processing for location-based services. In: 2016 IEEE 32nd international conference on data engineering (ICDE), pp. 1406–1409 (2016)

  50. Eldawy, A., Alarabi, L., Mokbel, M.F.: Spatial partitioning techniques in SpatialHadoop. Proc. VLDB Endowment 8(12), 1602–1605 (2015)

    Article  Google Scholar 

  51. Amini, S., Gerostathopoulos, I., Prehofer, C.: Big data analytics architecture for real-time traffic control. In: 2017 5th IEEE international conference on models and technologies for intelligent transportation systems (MT-ITS), pp. 710–715 (2017)

  52. Abdelhaq, H., Gertz, M.: On the locality of keywords in twitter streams. In: Proceedings of the 5th ACM SIGSPATIAL international workshop on geostreaming, pp. 12–20 (2014)

  53. Jacox, E.H., Samet, H.: Spatial join techniques. ACM Trans. Database Syst. 32(1), 7 (2007)

    Article  Google Scholar 

  54. Kriegel, H., Kröger, P., Sander, J., Zimek, A.: Density-based clustering. Wiley Interdiscip Rev Data Min Knowl Discov 1(3), 231–240 (2011)

    Article  Google Scholar 

  55. Ester, M., Kriegel, H., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Kdd, pp. 226–231 (1996)

  56. Dai, B., Lin, I.: Efficient map/reduce-based dbscan algorithm with optimized data partition. In: 2012 IEEE fifth international conference on cloud computing, pp. 59–66 (2012)

  57. He, Y., Tan, H., Luo, W., Feng, S., Fan, J.: MR-DBSCAN: a scalable MapReduce-based DBSCAN algorithm for heavily skewed data. Front. Comput. Sci. 8(1), 83–99 (2014)

    Article  MathSciNet  Google Scholar 

  58. Xu, R., Wunsch, D.: Clustering, vol. 10. Wiley, New York (2008)

    Book  Google Scholar 

  59. Wang, W., Yang, J., Muntz, R.: PK-tree: a spatial index structure for high dimensional point data. In: Information Organization and Databases Anonymous Springer, pp. 281–293 (2000)

  60. Aji, A., Wang, F.: High performance spatial query processing for large scale scientific data. In: Proceedings of the on SIGMOD/PODS 2012 Ph.D. symposium, pp. 9–14 (2012)

  61. Zhong, Y., Zhu, X., Fang, J.: Elastic and effective spatio-temporal query processing scheme on hadoop. In: Proceedings of the 1st ACM SIGSPATIAL international workshop on analytics for big geospatial data, pp. 33–42 (2012)

  62. Hagedorn, S., Gotze, P., Sattler, K.: The STARK framework for spatio-temporal data analytics on spark. Datenbanksysteme Für Business, Technologie Und Web (BTW 2017) (2017)

  63. Giachetta, R.: A framework for processing large scale geospatial and remote sensing data in MapReduce environment. Comput. Graph. 49, 37–46 (2015)

    Article  Google Scholar 

  64. Whitman, R.T., Park, M.B., Ambrose, S.M., Hoel, E.G.: Spatial indexing and analytics on hadoop. In: Proceedings of the 22nd ACM SIGSPATIAL international conference on advances in geographic information systems, pp. 73–82 (2014)

  65. Al Naami, K.M., Seker, S., Khan, L.: GISQF: an efficient spatial query processing system. In: 2014 IEEE 7th international conference on cloud computing, pp. 681–688 (2014)

  66. Fahmy, M.M., Elghandour, I., Nagi M.: CoS-HDFS: Co-locating geo-distributed spatial data in hadoop distributed file system. In: 2016 IEEE/ACM 3rd international conference on big data computing applications and technologies (BDCAT), pp. 123–132 (2016)

  67. Han, D., Stroulia, E.: Hgrid: a data model for large geospatial data sets in hbase. In: 2013 IEEE sixth international conference on cloud computing, pp. 910–917 (2013)

  68. Weixin, Z., Zhe, Y., Lin, W., Feilong, W., Chengqi, C.: The non-sql spatial data management model in big data time. In: 2015 IEEE international geoscience and remote sensing symposium (IGARSS), pp. 4506–4509 (2015)

  69. Li, S., Amin, M.T., Ganti, R., Srivatsa, M., Hu, S., Zhao, Y., Abdelzaher, T.: Stark: optimizing in-memory computing for dynamic dataset collections. In: 2017 IEEE 37th international conference on distributed computing systems (ICDCS), pp. 103–114 (2017)

  70. Zheng, K., Gu, D., Fang, F., Zhang, M., Zheng, K., Li, Q.: Data storage optimization strategy in distributed column-oriented database by considering spatial adjacency. Cluster Comput. 20(4), 2833–2844 (2017)

    Article  Google Scholar 

  71. Brinkhoff, T., Kriegel, H., Schneider, R., Seeger, B.: Multi-step processing of spatial joins. ACM 23(2), 197–208 (1994)

    Google Scholar 

  72. Sriharsha, R.: Magellan: geospatial analytics on spark. Retrieved May, vol. 1, pp. 2018 (2015)

  73. Baig, F., Vo, H., Kurc, T., Saltz, J., Wang, F.: Sparkgis: resource aware efficient in-memory spatial query processing. In: Proceedings of the 25th ACM SIGSPATIAL international conference on advances in geographic information systems, pp. 1–10 (2017)

  74. Xie, D., Li, F., Yao, B., Li, G., Zhou, L., Guo, M.: Simba: efficient in-memory spatial analytics. In: Proceedings of the 2016 international conference on management of data, pp. 1071–1085 (2016)

  75. Reiss, C., Tumanov, A., Ganger, G.R., Katz, R.H., Kozuch, M.A.: Heterogeneity and dynamicity of clouds at scale: Google trace analysis. In: Proceedings of the third ACM symposium on cloud computing, pp. 7 (2012)

  76. Delimitrou, C., Kozyrakis, C.: Quasar: resource-efficient and QoS-aware cluster management. In: ACM SIGARCH computer architecture news, pp. 127–144 (2014)

Download references

Acknowledgements

This research was supported by the IDEHA project funded by PON “RICERCA E INNOVAZIONE” 2014–2020 (No. J46C18000440008) and by the SACHER (Smart Architecture for Cultural Heritage in Emilia Romagna) project funded by the POR-FESR 2014-20 (No. J32I16000120009).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Paolo Bellavista.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Al Jawarneh, I.M., Bellavista, P., Corradi, A. et al. Big Spatial Data Management for the Internet of Things: A Survey. J Netw Syst Manage 28, 990–1035 (2020). https://doi.org/10.1007/s10922-020-09549-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10922-020-09549-6

Keywords

Navigation