Abstract
The necessity of organising big streams of Earth Observation (EO) data induces the efficient clustering of image patches, deriving from satellite imagery, into groups. Since the different concepts of the satellite image patches are not known a priori, DBSCAN-Martingale can be applied to estimate the number of the desired clusters. In this paper we provide a parallel version of the DBSCAN-Martingale algorithm and a framework for clustering EO data in an unsupervised way. The approach is evaluated on a benchmark dataset of Sentinel-2 images with ground-truth annotation and is also implemented on High Performance Computing (HPC) infrastructure to demonstrate its scalability. Finally, a cost-benefit analysis is conducted to find the optimal selection of reserved nodes for running the proposed algorithm, in relation to execution time and cost.
Keywords
- Density-based clustering
- Image clustering
- High Performance Computing
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Ankerst, M., Breunig, M.M., Kriegel, H.P., Sander, J.: Optics: ordering points to identify the clustering structure. ACM Sigmod Rec. 28(2), 49–60 (1999)
Cai, Z., Wang, J., He, K.: Adaptive density-based spatial clustering for massive data analysis. IEEE Access 8, 23346–23358 (2020)
Chen, G., Cheng, Y., Jing, W.: DBSCAN-PSM: an improvement method of DBSCAN algorithm on spark. Int. J. High Perf. Comput. Netw. 13(4), 417–426 (2019)
Chen, Y., et al.: KNN-BLOCK DBSCAN: fast clustering for large-scale data. IEEE Trans. Syst. Man Cybern. Syst. 51, 3939–3953 (2019)
Deng, C., Song, J., Cai, S., Sun, R., Shi, Y., Hao, S.: K-DBSCAN: an efficient density-based clustering algorithm supports parallel computing. Int. J. Simul. Process Model. 13(5), 496–505 (2018)
Diao, K., Liang, Y., Fan, J.: An improved DBSCAN algorithm using local parameters. In: Zhou, Z.-H., Yang, Q., Gao, Y., Zheng, Yu. (eds.) ICAI 2018. CCIS, vol. 888, pp. 3–12. Springer, Singapore (2018). https://doi.org/10.1007/978-981-13-2122-1_1
Ding, H., Yang, F.: On metric DBSCAN with low doubling dimension. arXiv preprint arXiv:2002.11933 (2020)
Ester, M., Kriegel, H.P., Sander, J., Xu, X., et al.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, vol. 96, pp. 226–231 (1996)
Galán, S.F.: Comparative evaluation of region query strategies for DBSCAN clustering. Inf. Sci. 502, 76–90 (2019)
Gialampoukidis, I., Vrochidis, S., Kompatsiaris, I.: A hybrid framework for news clustering based on the DBSCAN-martingale and LDA. In: MLDM 2016. LNCS (LNAI), vol. 9729, pp. 170–184. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41920-6_13
Gialampoukidis, I., Vrochidis, S., Kompatsiaris, I., Antoniou, I.: Probabilistic density-based estimation of the number of clusters using the DBSCAN-martingale process. Pattern Recogn. Lett. 123, 23–30 (2019)
Gong, Y., Sinnott, R.O., Rimba, P.: RT-DBSCAN: real-time parallel clustering of spatio-temporal data using spark-streaming. In: Shi, Y., et al. (eds.) ICCS 2018. LNCS, vol. 10860, pp. 524–539. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93698-7_40
Han, D., Agrawal, A., Liao, W.k., Choudhary, A.: Parallel DBSCAN algorithm using a data partitioning strategy with spark implementation. In: 2018 IEEE International Conference on Big Data (Big Data), pp. 305–312. IEEE (2018)
Hou, J., Lv, C., Zhang, A., E, X.: Merging DBSCAN and density peak for robust clustering. In: Tetko, I.V., Kůrková, V., Karpov, P., Theis, F. (eds.) ICANN 2019. LNCS, vol. 11730, pp. 595–610. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30490-4_48
Hu, X., Liu, L., Qiu, N., Yang, D., Li, M.: A mapreduce-based improvement algorithm for DBSCAN. J. Algorithms Comput. Technol. 12(1), 53–61 (2018)
Ibrahim, R., Shafiq, M.O.: Towards a new approach for empowering the mr-dbscan clustering for massive data using quadtree. In: 2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS), pp. 91–98. IEEE (2018)
Jang, J., Jiang, H.: DBSCAN++: towards fast and scalable density clustering. In: International Conference on Machine Learning, pp. 3019–3029. PMLR (2019)
Johnson, T., Prabhu, K., Parvatkar, S., Naik, A., Temkar, P.: The bisecting min max DBSCAN algorithm (2018)
Kim, J.H., Choi, J.H., Yoo, K.H., Nasridinov, A.: AA-DBSCAN: an approximate adaptive DBSCAN for finding clusters with varying densities. J. Supercomput. 75(1), 142–169 (2019)
Kriegel, H.P., Kröger, P., Sander, J., Zimek, A.: Density-based clustering. Wiley Interdisc. Rev. Data Min. Knowl. Disc. 1(3), 231–240 (2011)
Kumari, A., Shrivastava, V., Pandey, A.: Reduction of DBSCAN time complexity for data mining using parallel computing techniques (2019)
Lary, D.J., Alavi, A.H., Gandomi, A.H., Walker, A.L.: Machine learning in geosciences and remote sensing. Geosci. Front. 7(1), 3–10 (2016)
Li, H., Liu, X., Li, T., Gan, R.: A novel density-based clustering algorithm using nearest neighbor graph. Pattern Recognit. 102, 107206 (2020)
Li, J., Chen, Y.: Improved DBSCAN algorithm based on natural neighbors. Mod. Comput. 13 (2018)
Li, J., Han, X., Jiang, J., Hu, Y., Liu, L.: An efficient clustering method for DBSCAN geographic spatio-temporal large data with improved parameter optimization. Int. Arch. Photogram. Remote Sens. Spat. Inf. Sci. 42, 581–584 (2020)
Li, S.S.: An improved DBSCAN algorithm based on the neighbor similarity and fast nearest neighbor query. IEEE Access 8, 47468–47476 (2020)
Liyang, L., Hongzhen, S., Shen, W., Jinyu, L.: Parallel implementation of DBSCAN algorithm based on spark (2016)
Lu, S.: Self-adaption grey DBSCAN clustering. arXiv preprint arXiv:1912.11477 (2019)
Mai, S.T., Assent, I., Jacobsen, J., Dieu, M.S.: Anytime parallel density-based clustering. Data Mining Knowl. Disc. 32(4), 1121–1176 (2018). https://doi.org/10.1007/s10618-018-0562-1
Maxwell, A.E., Warner, T.A., Fang, F.: Implementation of machine-learning classification in remote sensing: an applied review. Int. J. Remote Sens. 39(9), 2784–2817 (2018)
Pandey, S., Samal, M., Mohanty, S.K.: An SNN-DBSCAN based clustering algorithm for big data. In: Advanced Computing and Intelligent Engineering, pp. 127–137 (2020)
Sarma, A., et al.: \(\mu \)dbscan: an exact scalable DBSCAN algorithm for big data exploiting spatial locality. In: 2019 IEEE International Conference on Cluster Computing (CLUSTER), pp. 1–11. IEEE (2019)
Shibla, T., Kumar, K.S.: Improving efficiency of DBSCAN by parallelizing kd-tree using spark. In: 2018 Second International Conference on Intelligent Computing and Control Systems (ICICCS), pp. 1197–1203. IEEE (2018)
Shiqiu, Y., Qingsheng, Z.: DBSCAN clustering algorithm based on locality sensitive hashing. In: Journal of Physics: Conference Series, vol. 1314, p. 012177. IOP Publishing (2019)
Song, H., Lee, J.G.: RP-DBSCAN: a superfast parallel DBSCAN algorithm based on random partitioning. In: 2018 International Conference on Management of Data, pp. 1173–1187 (2018)
Sumbul, G., et al.: Bigearthnet-mm: a large scale multi-modal multi-label benchmark archive for remote sensing image classification and retrieval. arXiv preprint arXiv:2105.07921 (2021)
Tyercha, E.R., Kazmaier, G.S., Gildhoff, H., Pekel, I., Volker, L., Grouisborn, T.: Hilbert curve partitioning for parallelization of DBSCAN. uS Patent 10,318,557 (2019)
Wang, Y., Gu, Y., Shun, J.: Theoretically-efficient and practical parallel DBSCAN. In: 2020 ACM SIGMOD International Conference on Management of Data, pp. 2555–2571 (2020)
Yang, K., Gao, Y., Ma, R., Chen, L., Wu, S., Chen, G.: DBSCAN-MS: distributed density-based clustering in metric spaces. In: 2019 IEEE 35th International Conference on Data Engineering (ICDE), pp. 1346–1357. IEEE (2019)
Yu, H., Chen, L., Yao, J., Wang, X.: A three-way clustering method based on an improved DBSCAN algorithm. Physica A Stat. Mech. Appl. 535, 122289 (2019)
Zhou, G.J.: Research on parallel design of DBSCAN clustering algorithm in spatial data mining. DEStech Trans. Eng. Technol. Res. (ecar) (2018)
Acknowledgement
This work has been supported by the EU’s Horizon 2020 research and innovation programme under grant agreements H2020-101004152 CALLISTO and H2020-776019 EOPEN.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Gialampoukidis, I. et al. (2022). Parallel DBSCAN-Martingale Estimation of the Number of Concepts for Automatic Satellite Image Clustering. In: Þór Jónsson, B., et al. MultiMedia Modeling. MMM 2022. Lecture Notes in Computer Science, vol 13141. Springer, Cham. https://doi.org/10.1007/978-3-030-98358-1_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-98358-1_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-98357-4
Online ISBN: 978-3-030-98358-1
eBook Packages: Computer ScienceComputer Science (R0)