Skip to main content

Clustering Algorithms for Spatial Big Data

  • Conference paper
  • First Online:
Computational Science and Its Applications – ICCSA 2017 (ICCSA 2017)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10407))

Included in the following conference series:

  • 2891 Accesses

Abstract

In our time people and devices constantly generate data. User activity generates data about needs and preferences as well as the quality of their experiences in different ways: i.e. streaming a video, looking at the news, searching for a restaurant or a an hotel, playing a game with others, making purchases, driving a car. Even when people put their devices in their pockets, the network is generating location and other data that keeps services running and ready to use. This rapid developments in the availability and access to data and in particular spatially referenced data in a different areas, has induced the need for better analysis techniques to understand the various phenomena. Spatial clustering algorithms, which groups similar spatial objects into classes, can be used for the identification of areas sharing common characteristics. The aim of this paper is to analyze the performance of three different clustering algorithms i.e. the Density-Based Spatial Clustering of Applications with Noise algorithm (DBSCAN), the Fast Search by Density Peak (FSDP) algorithm and the classic K-means algorithm (K-Means) as regards the analysis of spatial big data. We propose a modification of the FSDP algorithm in order to improve its efficiency in large databases. The applications concern both synthetic data sets and satellite images.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bailey, T.C., Gatrell, A.C.: Interactive Spatial Data Analysis. Addison Wesley Longman, Edinburgh (1996)

    Google Scholar 

  2. Bedard, Y.: Beyond GIS: Spatial On-line Analytical Processing and Big Data. University of Maine (2014). http://umaine.edu/scis/files/2014/09/Beyond-GIS-Spatial-On-Line-Analytical-Processing-and-Big-Data-Yvan-Bedard.pdf

  3. Chen, Y., Suel, T., Markowetz, A.: Efficient query processing in geographic web search engines. In: SIGMOD 2006, Chicago, Illinois, USA, 27–29 June 2006 (2006). http://cis.poly.edu/suel/papers/geoquery.pdf

  4. Cressie, N.: Statistics for Spatial Data. Wiley, London (1993)

    MATH  Google Scholar 

  5. Cugler, D.C., Dev, O., Evans, M.R., Shekhar, S., Medeiros, C.B.: Spatial Big Data: Platforms, Analytics, and Science. http://www.spatial.cs.umn.edu/geojournal/2013/geojournal.pdf. Accessed 22 Sept 2016

  6. El-Sonbaty, Y., Ismail, M.A., Farouk, M.: An Efficient density-based clustering algorithm for large databases. In: Proceedings of the 16th IEEE International Conference on Tods with Artificial Intelligence (ICTAI) (2004)

    Google Scholar 

  7. Ester, M., Kriegel, H.P., Sander, J., Xiaowei, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceeding of the 2nd International Conference on Knowledge Discovery and Data Mining, pp. 94–99 (1996)

    Google Scholar 

  8. Fayyad, U., Piatesky–Shapiro, G., Smyth, P.: From Data Mining to Knowledge Discovery in Databases (1996). http://www.kdnuggets.com/gpspubs/aimag-kdd-overview-1996-Fayyad.pdf

  9. Fahad, A., Alshatri, N., Tari, Z., Alamri, A., Khalil, I., Zomaya, A.Y., Foufou, S., Bouras, A.: A survey of clustering algorithms for big data: taxonomy and empirical analysis. IEEE Trans. Emerg. Topics Comput. 2(3), 267–279 (2014)

    Article  Google Scholar 

  10. Han, J., Kamber, M., Tung, A.K.H.: Spatial Clutering Methods in Data Mining: A Survey (2001). ftp://ftp.fas.sfu.ca/pub/cs/han/pdf/gkdbk01.pdf

  11. Hemalatha, M., Naga Saranya, N.: A recent survey on knowledge discovery in spatial data mining. Int. J. Comput. Sci. Issues 8(3), 473–479 (2011)

    Google Scholar 

  12. Jan, A.K.: Data clustering. 50 years beyond K-means. Pattern Recogn. Lett. 31(8), 651–666 (2010)

    Article  Google Scholar 

  13. Khan, K., Rehman, S.U., Fong, S., Sarasvady, S.: DBSCAN: past, present and future. In: The Fifth International Conference on the Applications of Digital Information and Web Technologies, February 2014, pp. 232–238 (2014)

    Google Scholar 

  14. Koperski, K., Han, J., Adhikary, J.: Mining Knowledge in Geographical Data (1998). ftp://ftp.fas.sfu.ca/pubcs/han/pdf/geo_survey98.pdf.

  15. Lee, J., Kang, M.: Geospatial big data: challenges and opportunities. Big Data Res. 2(2), 74–78 (2015)

    Article  MathSciNet  Google Scholar 

  16. Liu, J., Li, J., Li, W., Wu, J.: Rethinking Big Data: a review on the data quality and usage issues. PRS J. Photogrammetry Remote Sens. 115, 134–142 (2016)

    Article  Google Scholar 

  17. Mao, J., Jain, A.K.: A self-organizing network for hyper-ellipsoidal clustering (HEC). IEEE Trans. Neural Netw. 7(1), 16–29 (1996)

    Article  Google Scholar 

  18. Mennis, J., Guo, D.: Spatial Computing. “Spatial Data Mining”. https://www.youtube.com/watch?v=sZeb93O_z2w&list=PLN5UPhO05nn8WE4ZbzUwUhzq_p2XChK6r&index=3. Accessed 22 May 2016

  19. Mohebi, A., Aghabozorgi, S., Wah, T.Y., Herawan, T., Yahyapour, R.: Iterative Big Data clustering algorithms: a review. Soft. Pract. Exp. 46(1), 107–129 (2016)

    Article  Google Scholar 

  20. Pragati, S., Hitech, G.: A review of density-based clustering in spatial data. Int. J. Adv. Comput. Res. 2(5), 210–213 (2012)

    Google Scholar 

  21. Rodriguez, A., Laio, A.: Clustering by fast search and find of density peaks. Science 344(6191), 1492–1496 (2014)

    Article  Google Scholar 

  22. Sander, J., Ester, M., Kriegel, H.P., Xiaowei, X.: Density-based clustering in spatial databases: the algorithm GDBSCAN and its applications (1999). http://www.dbs.informatik.uni-muenchen.de/Publikationen/

  23. Schoier, G., Bato, B.: A modification of the DBSCAN algorithm in a spatial data mining approach. In: Meeting of the Classification and Data Analysis Group of the SIS, CLADAG 2007, Macerata, 12–14 September 2007, pp. 395–398. EUM, Macerata (2007)

    Google Scholar 

  24. Schoier, G., Borruso, G.: A methodology for dealing with spatial big data. Int. J. Bus. Intel. Data Min. 12(1), 1–13 (2017)

    Article  Google Scholar 

  25. Steinbach, M., Ertöz, L., Kumar V.: The Challenges of Clustering High Dimensional Data (2003). http://www-users.cs.umn.edu/~kumar/papers/high_dim_clustering_19.pdf

  26. Xu, R., Wunsch II, D.: Survey of Clustering Algorithms (2005). http://ieeexplore.ieee.org/iel5/72/30822/01427769.pdf

  27. Wang, S., Yuan, H.: Spatial data mining in the context of big data. In: 2013 International Conference on Parallel and Distributed Systems, December 2013, pp. 486–491 (2013)

    Google Scholar 

  28. Ye, Q., Gao, W., Zeng, W.: Color image segmentation using density-based clustering. In: 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 3, p. III-345 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gabriella Schoier .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Schoier, G., Gregorio, C. (2017). Clustering Algorithms for Spatial Big Data. In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2017. ICCSA 2017. Lecture Notes in Computer Science(), vol 10407. Springer, Cham. https://doi.org/10.1007/978-3-319-62401-3_41

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-62401-3_41

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-62400-6

  • Online ISBN: 978-3-319-62401-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics