Skip to main content

Parallel DBSCAN-Martingale Estimation of the Number of Concepts for Automatic Satellite Image Clustering

  • 1786 Accesses

Part of the Lecture Notes in Computer Science book series (LNCS,volume 13141)

Abstract

The necessity of organising big streams of Earth Observation (EO) data induces the efficient clustering of image patches, deriving from satellite imagery, into groups. Since the different concepts of the satellite image patches are not known a priori, DBSCAN-Martingale can be applied to estimate the number of the desired clusters. In this paper we provide a parallel version of the DBSCAN-Martingale algorithm and a framework for clustering EO data in an unsupervised way. The approach is evaluated on a benchmark dataset of Sentinel-2 images with ground-truth annotation and is also implemented on High Performance Computing (HPC) infrastructure to demonstrate its scalability. Finally, a cost-benefit analysis is conducted to find the optimal selection of reserved nodes for running the proposed algorithm, in relation to execution time and cost.

Keywords

  • Density-based clustering
  • Image clustering
  • High Performance Computing

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://bigearth.net/.

  2. 2.

    https://www.hlrs.de/solutions-services/academic-users/legal-requirements/.

References

  1. Ankerst, M., Breunig, M.M., Kriegel, H.P., Sander, J.: Optics: ordering points to identify the clustering structure. ACM Sigmod Rec. 28(2), 49–60 (1999)

    CrossRef  Google Scholar 

  2. Cai, Z., Wang, J., He, K.: Adaptive density-based spatial clustering for massive data analysis. IEEE Access 8, 23346–23358 (2020)

    CrossRef  Google Scholar 

  3. Chen, G., Cheng, Y., Jing, W.: DBSCAN-PSM: an improvement method of DBSCAN algorithm on spark. Int. J. High Perf. Comput. Netw. 13(4), 417–426 (2019)

    Google Scholar 

  4. Chen, Y., et al.: KNN-BLOCK DBSCAN: fast clustering for large-scale data. IEEE Trans. Syst. Man Cybern. Syst. 51, 3939–3953 (2019)

    CrossRef  Google Scholar 

  5. Deng, C., Song, J., Cai, S., Sun, R., Shi, Y., Hao, S.: K-DBSCAN: an efficient density-based clustering algorithm supports parallel computing. Int. J. Simul. Process Model. 13(5), 496–505 (2018)

    CrossRef  Google Scholar 

  6. Diao, K., Liang, Y., Fan, J.: An improved DBSCAN algorithm using local parameters. In: Zhou, Z.-H., Yang, Q., Gao, Y., Zheng, Yu. (eds.) ICAI 2018. CCIS, vol. 888, pp. 3–12. Springer, Singapore (2018). https://doi.org/10.1007/978-981-13-2122-1_1

    CrossRef  Google Scholar 

  7. Ding, H., Yang, F.: On metric DBSCAN with low doubling dimension. arXiv preprint arXiv:2002.11933 (2020)

  8. Ester, M., Kriegel, H.P., Sander, J., Xu, X., et al.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, vol. 96, pp. 226–231 (1996)

    Google Scholar 

  9. Galán, S.F.: Comparative evaluation of region query strategies for DBSCAN clustering. Inf. Sci. 502, 76–90 (2019)

    CrossRef  MathSciNet  Google Scholar 

  10. Gialampoukidis, I., Vrochidis, S., Kompatsiaris, I.: A hybrid framework for news clustering based on the DBSCAN-martingale and LDA. In: MLDM 2016. LNCS (LNAI), vol. 9729, pp. 170–184. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41920-6_13

    CrossRef  Google Scholar 

  11. Gialampoukidis, I., Vrochidis, S., Kompatsiaris, I., Antoniou, I.: Probabilistic density-based estimation of the number of clusters using the DBSCAN-martingale process. Pattern Recogn. Lett. 123, 23–30 (2019)

    CrossRef  Google Scholar 

  12. Gong, Y., Sinnott, R.O., Rimba, P.: RT-DBSCAN: real-time parallel clustering of spatio-temporal data using spark-streaming. In: Shi, Y., et al. (eds.) ICCS 2018. LNCS, vol. 10860, pp. 524–539. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93698-7_40

    CrossRef  Google Scholar 

  13. Han, D., Agrawal, A., Liao, W.k., Choudhary, A.: Parallel DBSCAN algorithm using a data partitioning strategy with spark implementation. In: 2018 IEEE International Conference on Big Data (Big Data), pp. 305–312. IEEE (2018)

    Google Scholar 

  14. Hou, J., Lv, C., Zhang, A., E, X.: Merging DBSCAN and density peak for robust clustering. In: Tetko, I.V., Kůrková, V., Karpov, P., Theis, F. (eds.) ICANN 2019. LNCS, vol. 11730, pp. 595–610. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30490-4_48

    CrossRef  Google Scholar 

  15. Hu, X., Liu, L., Qiu, N., Yang, D., Li, M.: A mapreduce-based improvement algorithm for DBSCAN. J. Algorithms Comput. Technol. 12(1), 53–61 (2018)

    CrossRef  MathSciNet  Google Scholar 

  16. Ibrahim, R., Shafiq, M.O.: Towards a new approach for empowering the mr-dbscan clustering for massive data using quadtree. In: 2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS), pp. 91–98. IEEE (2018)

    Google Scholar 

  17. Jang, J., Jiang, H.: DBSCAN++: towards fast and scalable density clustering. In: International Conference on Machine Learning, pp. 3019–3029. PMLR (2019)

    Google Scholar 

  18. Johnson, T., Prabhu, K., Parvatkar, S., Naik, A., Temkar, P.: The bisecting min max DBSCAN algorithm (2018)

    Google Scholar 

  19. Kim, J.H., Choi, J.H., Yoo, K.H., Nasridinov, A.: AA-DBSCAN: an approximate adaptive DBSCAN for finding clusters with varying densities. J. Supercomput. 75(1), 142–169 (2019)

    CrossRef  Google Scholar 

  20. Kriegel, H.P., Kröger, P., Sander, J., Zimek, A.: Density-based clustering. Wiley Interdisc. Rev. Data Min. Knowl. Disc. 1(3), 231–240 (2011)

    CrossRef  Google Scholar 

  21. Kumari, A., Shrivastava, V., Pandey, A.: Reduction of DBSCAN time complexity for data mining using parallel computing techniques (2019)

    Google Scholar 

  22. Lary, D.J., Alavi, A.H., Gandomi, A.H., Walker, A.L.: Machine learning in geosciences and remote sensing. Geosci. Front. 7(1), 3–10 (2016)

    CrossRef  Google Scholar 

  23. Li, H., Liu, X., Li, T., Gan, R.: A novel density-based clustering algorithm using nearest neighbor graph. Pattern Recognit. 102, 107206 (2020)

    Google Scholar 

  24. Li, J., Chen, Y.: Improved DBSCAN algorithm based on natural neighbors. Mod. Comput. 13 (2018)

    Google Scholar 

  25. Li, J., Han, X., Jiang, J., Hu, Y., Liu, L.: An efficient clustering method for DBSCAN geographic spatio-temporal large data with improved parameter optimization. Int. Arch. Photogram. Remote Sens. Spat. Inf. Sci. 42, 581–584 (2020)

    CrossRef  Google Scholar 

  26. Li, S.S.: An improved DBSCAN algorithm based on the neighbor similarity and fast nearest neighbor query. IEEE Access 8, 47468–47476 (2020)

    CrossRef  Google Scholar 

  27. Liyang, L., Hongzhen, S., Shen, W., Jinyu, L.: Parallel implementation of DBSCAN algorithm based on spark (2016)

    Google Scholar 

  28. Lu, S.: Self-adaption grey DBSCAN clustering. arXiv preprint arXiv:1912.11477 (2019)

  29. Mai, S.T., Assent, I., Jacobsen, J., Dieu, M.S.: Anytime parallel density-based clustering. Data Mining Knowl. Disc. 32(4), 1121–1176 (2018). https://doi.org/10.1007/s10618-018-0562-1

    CrossRef  MathSciNet  MATH  Google Scholar 

  30. Maxwell, A.E., Warner, T.A., Fang, F.: Implementation of machine-learning classification in remote sensing: an applied review. Int. J. Remote Sens. 39(9), 2784–2817 (2018)

    CrossRef  Google Scholar 

  31. Pandey, S., Samal, M., Mohanty, S.K.: An SNN-DBSCAN based clustering algorithm for big data. In: Advanced Computing and Intelligent Engineering, pp. 127–137 (2020)

    Google Scholar 

  32. Sarma, A., et al.: \(\mu \)dbscan: an exact scalable DBSCAN algorithm for big data exploiting spatial locality. In: 2019 IEEE International Conference on Cluster Computing (CLUSTER), pp. 1–11. IEEE (2019)

    Google Scholar 

  33. Shibla, T., Kumar, K.S.: Improving efficiency of DBSCAN by parallelizing kd-tree using spark. In: 2018 Second International Conference on Intelligent Computing and Control Systems (ICICCS), pp. 1197–1203. IEEE (2018)

    Google Scholar 

  34. Shiqiu, Y., Qingsheng, Z.: DBSCAN clustering algorithm based on locality sensitive hashing. In: Journal of Physics: Conference Series, vol. 1314, p. 012177. IOP Publishing (2019)

    Google Scholar 

  35. Song, H., Lee, J.G.: RP-DBSCAN: a superfast parallel DBSCAN algorithm based on random partitioning. In: 2018 International Conference on Management of Data, pp. 1173–1187 (2018)

    Google Scholar 

  36. Sumbul, G., et al.: Bigearthnet-mm: a large scale multi-modal multi-label benchmark archive for remote sensing image classification and retrieval. arXiv preprint arXiv:2105.07921 (2021)

  37. Tyercha, E.R., Kazmaier, G.S., Gildhoff, H., Pekel, I., Volker, L., Grouisborn, T.: Hilbert curve partitioning for parallelization of DBSCAN. uS Patent 10,318,557 (2019)

    Google Scholar 

  38. Wang, Y., Gu, Y., Shun, J.: Theoretically-efficient and practical parallel DBSCAN. In: 2020 ACM SIGMOD International Conference on Management of Data, pp. 2555–2571 (2020)

    Google Scholar 

  39. Yang, K., Gao, Y., Ma, R., Chen, L., Wu, S., Chen, G.: DBSCAN-MS: distributed density-based clustering in metric spaces. In: 2019 IEEE 35th International Conference on Data Engineering (ICDE), pp. 1346–1357. IEEE (2019)

    Google Scholar 

  40. Yu, H., Chen, L., Yao, J., Wang, X.: A three-way clustering method based on an improved DBSCAN algorithm. Physica A Stat. Mech. Appl. 535, 122289 (2019)

    CrossRef  Google Scholar 

  41. Zhou, G.J.: Research on parallel design of DBSCAN clustering algorithm in spatial data mining. DEStech Trans. Eng. Technol. Res. (ecar) (2018)

    Google Scholar 

Download references

Acknowledgement

This work has been supported by the EU’s Horizon 2020 research and innovation programme under grant agreements H2020-101004152 CALLISTO and H2020-776019 EOPEN.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stelios Andreadis .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gialampoukidis, I. et al. (2022). Parallel DBSCAN-Martingale Estimation of the Number of Concepts for Automatic Satellite Image Clustering. In: Þór Jónsson, B., et al. MultiMedia Modeling. MMM 2022. Lecture Notes in Computer Science, vol 13141. Springer, Cham. https://doi.org/10.1007/978-3-030-98358-1_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-98358-1_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-98357-4

  • Online ISBN: 978-3-030-98358-1

  • eBook Packages: Computer ScienceComputer Science (R0)