Abstract
In the world of mobile and Internet, large volume of data is generated with spatial components. Modern users demand fast, scalable and cost-effective solutions to perform relevant analytics on massively distributed data including spatial data. Traditional spatial data management systems are becoming less efficient to meet the current users demand due to poor scalability, limited computational power and storage. The potential approach is to develop data intensive spatial applications on parallel distributed architectures deployed on commodity clusters. The paper presents an open-source big data analytics framework to load, store, process and perform ad-hoc query processing on spatial and non-spatial data at scale. The system is built on top of Spark framework with a new input data source NoSQL database i.e. Cassandra. It is implemented by performing analytics operations like filtration, aggregation, exact match, proximity and K nearest neighbor search. It also provides an application architecture to accelerate ad-hoc query processing by diverting user queries to the suitable framework either Cassandra or Spark via a common web based REST interface. The framework is evaluated by analyzing the performance of the system in terms of latency against variable size of data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
References
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. HotCloud 10(10–10), 95 (2010)
Open Geospatial Consortium. http://www.opengeospatial.org/
Website of MongoDB. http://www.mongodb.org
Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. ACM SIGOPS Oper. Syst. Rev. 44(2), 35–40 (2010)
Ben Brahim, M., Drira, W., Filali, F., Noureddine, H.: Spatial data extension for Cassandra NoSQL database. J. Big Data 3(1), 11 (2016)
Eldawy, A., Mokbel, M.F.: Spatialhadoop: a MapReduce framework for spatial data. In: 2015 IEEE 31st International Conference on Data Engineering (ICDE), pp. 1352–1363. IEEE (2015)
Aji, A., et al.: Hadoop gis: a high performance spatial data warehousing system over MapReduce. Proc. VLDB Endowment 6(11), 1009–1020 (2013)
Yu, J., Wu, J., Sarwat, M.: Geospark: a cluster computing framework for processing large-scale spatial data. In: Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems, p. 70. ACM (2015)
Website of Magellan. https://github.com/harsha2010/magellan. Magellan - https://hortonworks.com/blog/magellan-geospatial-analytics-in-spark/; https://github.com/harsha2010/magellan
Website of Spatialspark. http://simin.me/projects/spatialspark/
R Core Team: R: a language and environment for statistical computing. In: R Foundation for Statistical Computing, Vienna, Austria 2013 (2014)
Eldawy, A., Mokbel, M.F.: Pigeon: a spatial MapReduce language. In: 2014 IEEE 30th International Conference on Data Engineering (ICDE), pp. 1242–1245. IEEE (2014)
Website of spark-cassandra-connector. https://github.com/datastax/spark-cassandra-connector
Güting, R.H.: An introduction to spatial database systems. VLDB J. Int. J. Very Large Data Bases 3(4), 357–399 (1994)
Eldawy, A., Li, Y., Mokbel, M.F., Janardan, R.: CG_Hadoop: computational geometry in MapReduce. In: Proceedings of the 21st ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pp. 294–303. ACM (2013)
Acknowledgement
This work is a part of a research project on ‘Developing Data Analytics Architecture, Applications in Agriculture’, funded by NRDMS and NSDI, Department of Science and Technology, Govt. of India, year 2017–2019.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Shah, P., Chaudhary, S. (2018). Big Data Analytics Framework for Spatial Data. In: Mondal, A., Gupta, H., Srivastava, J., Reddy, P., Somayajulu, D. (eds) Big Data Analytics. BDA 2018. Lecture Notes in Computer Science(), vol 11297. Springer, Cham. https://doi.org/10.1007/978-3-030-04780-1_17
Download citation
DOI: https://doi.org/10.1007/978-3-030-04780-1_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-04779-5
Online ISBN: 978-3-030-04780-1
eBook Packages: Computer ScienceComputer Science (R0)