Skip to main content

Big Data Analytics Framework for Spatial Data

  • Conference paper
  • First Online:
Big Data Analytics (BDA 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11297))

Included in the following conference series:

Abstract

In the world of mobile and Internet, large volume of data is generated with spatial components. Modern users demand fast, scalable and cost-effective solutions to perform relevant analytics on massively distributed data including spatial data. Traditional spatial data management systems are becoming less efficient to meet the current users demand due to poor scalability, limited computational power and storage. The potential approach is to develop data intensive spatial applications on parallel distributed architectures deployed on commodity clusters. The paper presents an open-source big data analytics framework to load, store, process and perform ad-hoc query processing on spatial and non-spatial data at scale. The system is built on top of Spark framework with a new input data source NoSQL database i.e. Cassandra. It is implemented by performing analytics operations like filtration, aggregation, exact match, proximity and K nearest neighbor search. It also provides an application architecture to accelerate ad-hoc query processing by diverting user queries to the suitable framework either Cassandra or Spark via a common web based REST interface. The framework is evaluated by analyzing the performance of the system in terms of latency against variable size of data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://hadoop.apache.org/.

  2. 2.

    Geohash WG. Geohash. https://www.en.wikipedia.org/wiki/Geohash.

  3. 3.

    http://spark.rstudio.com/.

  4. 4.

    http://www.shiny.rstudio.com.

  5. 5.

    http://www.andresmh.com/nyctaxitrips/.

References

  1. Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. HotCloud 10(10–10), 95 (2010)

    Google Scholar 

  2. Open Geospatial Consortium. http://www.opengeospatial.org/

  3. Website of MongoDB. http://www.mongodb.org

  4. Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. ACM SIGOPS Oper. Syst. Rev. 44(2), 35–40 (2010)

    Article  Google Scholar 

  5. Ben Brahim, M., Drira, W., Filali, F., Noureddine, H.: Spatial data extension for Cassandra NoSQL database. J. Big Data 3(1), 11 (2016)

    Article  Google Scholar 

  6. Eldawy, A., Mokbel, M.F.: Spatialhadoop: a MapReduce framework for spatial data. In: 2015 IEEE 31st International Conference on Data Engineering (ICDE), pp. 1352–1363. IEEE (2015)

    Google Scholar 

  7. Aji, A., et al.: Hadoop gis: a high performance spatial data warehousing system over MapReduce. Proc. VLDB Endowment 6(11), 1009–1020 (2013)

    Article  Google Scholar 

  8. Yu, J., Wu, J., Sarwat, M.: Geospark: a cluster computing framework for processing large-scale spatial data. In: Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems, p. 70. ACM (2015)

    Google Scholar 

  9. Website of Magellan. https://github.com/harsha2010/magellan. Magellan - https://hortonworks.com/blog/magellan-geospatial-analytics-in-spark/; https://github.com/harsha2010/magellan

  10. Website of Spatialspark. http://simin.me/projects/spatialspark/

  11. R Core Team: R: a language and environment for statistical computing. In: R Foundation for Statistical Computing, Vienna, Austria 2013 (2014)

    Google Scholar 

  12. Eldawy, A., Mokbel, M.F.: Pigeon: a spatial MapReduce language. In: 2014 IEEE 30th International Conference on Data Engineering (ICDE), pp. 1242–1245. IEEE (2014)

    Google Scholar 

  13. Website of spark-cassandra-connector. https://github.com/datastax/spark-cassandra-connector

  14. Güting, R.H.: An introduction to spatial database systems. VLDB J. Int. J. Very Large Data Bases 3(4), 357–399 (1994)

    Article  Google Scholar 

  15. Eldawy, A., Li, Y., Mokbel, M.F., Janardan, R.: CG_Hadoop: computational geometry in MapReduce. In: Proceedings of the 21st ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pp. 294–303. ACM (2013)

    Google Scholar 

Download references

Acknowledgement

This work is a part of a research project on ‘Developing Data Analytics Architecture, Applications in Agriculture’, funded by NRDMS and NSDI, Department of Science and Technology, Govt. of India, year 2017–2019.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Purnima Shah .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Shah, P., Chaudhary, S. (2018). Big Data Analytics Framework for Spatial Data. In: Mondal, A., Gupta, H., Srivastava, J., Reddy, P., Somayajulu, D. (eds) Big Data Analytics. BDA 2018. Lecture Notes in Computer Science(), vol 11297. Springer, Cham. https://doi.org/10.1007/978-3-030-04780-1_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-04780-1_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-04779-5

  • Online ISBN: 978-3-030-04780-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics