Abstract
In our time people and devices constantly generate data. User activity generates data about needs and preferences as well as the quality of their experiences in different ways: i.e. streaming a video, looking at the news, searching for a restaurant or a an hotel, playing a game with others, making purchases, driving a car. Even when people put their devices in their pockets, the network is generating location and other data that keeps services running and ready to use. This rapid developments in the availability and access to data and in particular spatially referenced data in a different areas, has induced the need for better analysis techniques to understand the various phenomena. Spatial clustering algorithms, which groups similar spatial objects into classes, can be used for the identification of areas sharing common characteristics. The aim of this paper is to analyze the performance of three different clustering algorithms i.e. the Density-Based Spatial Clustering of Applications with Noise algorithm (DBSCAN), the Fast Search by Density Peak (FSDP) algorithm and the classic K-means algorithm (K-Means) as regards the analysis of spatial big data. We propose a modification of the FSDP algorithm in order to improve its efficiency in large databases. The applications concern both synthetic data sets and satellite images.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bailey, T.C., Gatrell, A.C.: Interactive Spatial Data Analysis. Addison Wesley Longman, Edinburgh (1996)
Bedard, Y.: Beyond GIS: Spatial On-line Analytical Processing and Big Data. University of Maine (2014). http://umaine.edu/scis/files/2014/09/Beyond-GIS-Spatial-On-Line-Analytical-Processing-and-Big-Data-Yvan-Bedard.pdf
Chen, Y., Suel, T., Markowetz, A.: Efficient query processing in geographic web search engines. In: SIGMOD 2006, Chicago, Illinois, USA, 27–29 June 2006 (2006). http://cis.poly.edu/suel/papers/geoquery.pdf
Cressie, N.: Statistics for Spatial Data. Wiley, London (1993)
Cugler, D.C., Dev, O., Evans, M.R., Shekhar, S., Medeiros, C.B.: Spatial Big Data: Platforms, Analytics, and Science. http://www.spatial.cs.umn.edu/geojournal/2013/geojournal.pdf. Accessed 22 Sept 2016
El-Sonbaty, Y., Ismail, M.A., Farouk, M.: An Efficient density-based clustering algorithm for large databases. In: Proceedings of the 16th IEEE International Conference on Tods with Artificial Intelligence (ICTAI) (2004)
Ester, M., Kriegel, H.P., Sander, J., Xiaowei, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceeding of the 2nd International Conference on Knowledge Discovery and Data Mining, pp. 94–99 (1996)
Fayyad, U., Piatesky–Shapiro, G., Smyth, P.: From Data Mining to Knowledge Discovery in Databases (1996). http://www.kdnuggets.com/gpspubs/aimag-kdd-overview-1996-Fayyad.pdf
Fahad, A., Alshatri, N., Tari, Z., Alamri, A., Khalil, I., Zomaya, A.Y., Foufou, S., Bouras, A.: A survey of clustering algorithms for big data: taxonomy and empirical analysis. IEEE Trans. Emerg. Topics Comput. 2(3), 267–279 (2014)
Han, J., Kamber, M., Tung, A.K.H.: Spatial Clutering Methods in Data Mining: A Survey (2001). ftp://ftp.fas.sfu.ca/pub/cs/han/pdf/gkdbk01.pdf
Hemalatha, M., Naga Saranya, N.: A recent survey on knowledge discovery in spatial data mining. Int. J. Comput. Sci. Issues 8(3), 473–479 (2011)
Jan, A.K.: Data clustering. 50 years beyond K-means. Pattern Recogn. Lett. 31(8), 651–666 (2010)
Khan, K., Rehman, S.U., Fong, S., Sarasvady, S.: DBSCAN: past, present and future. In: The Fifth International Conference on the Applications of Digital Information and Web Technologies, February 2014, pp. 232–238 (2014)
Koperski, K., Han, J., Adhikary, J.: Mining Knowledge in Geographical Data (1998). ftp://ftp.fas.sfu.ca/pubcs/han/pdf/geo_survey98.pdf.
Lee, J., Kang, M.: Geospatial big data: challenges and opportunities. Big Data Res. 2(2), 74–78 (2015)
Liu, J., Li, J., Li, W., Wu, J.: Rethinking Big Data: a review on the data quality and usage issues. PRS J. Photogrammetry Remote Sens. 115, 134–142 (2016)
Mao, J., Jain, A.K.: A self-organizing network for hyper-ellipsoidal clustering (HEC). IEEE Trans. Neural Netw. 7(1), 16–29 (1996)
Mennis, J., Guo, D.: Spatial Computing. “Spatial Data Mining”. https://www.youtube.com/watch?v=sZeb93O_z2w&list=PLN5UPhO05nn8WE4ZbzUwUhzq_p2XChK6r&index=3. Accessed 22 May 2016
Mohebi, A., Aghabozorgi, S., Wah, T.Y., Herawan, T., Yahyapour, R.: Iterative Big Data clustering algorithms: a review. Soft. Pract. Exp. 46(1), 107–129 (2016)
Pragati, S., Hitech, G.: A review of density-based clustering in spatial data. Int. J. Adv. Comput. Res. 2(5), 210–213 (2012)
Rodriguez, A., Laio, A.: Clustering by fast search and find of density peaks. Science 344(6191), 1492–1496 (2014)
Sander, J., Ester, M., Kriegel, H.P., Xiaowei, X.: Density-based clustering in spatial databases: the algorithm GDBSCAN and its applications (1999). http://www.dbs.informatik.uni-muenchen.de/Publikationen/
Schoier, G., Bato, B.: A modification of the DBSCAN algorithm in a spatial data mining approach. In: Meeting of the Classification and Data Analysis Group of the SIS, CLADAG 2007, Macerata, 12–14 September 2007, pp. 395–398. EUM, Macerata (2007)
Schoier, G., Borruso, G.: A methodology for dealing with spatial big data. Int. J. Bus. Intel. Data Min. 12(1), 1–13 (2017)
Steinbach, M., Ertöz, L., Kumar V.: The Challenges of Clustering High Dimensional Data (2003). http://www-users.cs.umn.edu/~kumar/papers/high_dim_clustering_19.pdf
Xu, R., Wunsch II, D.: Survey of Clustering Algorithms (2005). http://ieeexplore.ieee.org/iel5/72/30822/01427769.pdf
Wang, S., Yuan, H.: Spatial data mining in the context of big data. In: 2013 International Conference on Parallel and Distributed Systems, December 2013, pp. 486–491 (2013)
Ye, Q., Gao, W., Zeng, W.: Color image segmentation using density-based clustering. In: 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 3, p. III-345 (2003)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Schoier, G., Gregorio, C. (2017). Clustering Algorithms for Spatial Big Data. In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2017. ICCSA 2017. Lecture Notes in Computer Science(), vol 10407. Springer, Cham. https://doi.org/10.1007/978-3-319-62401-3_41
Download citation
DOI: https://doi.org/10.1007/978-3-319-62401-3_41
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-62400-6
Online ISBN: 978-3-319-62401-3
eBook Packages: Computer ScienceComputer Science (R0)