Advertisement

Detecting global hyperparaboloid correlated clusters: a Hough-transform based multicore algorithm

  • Daniyal Kazempour
  • Markus Mauder
  • Peer Kröger
  • Thomas Seidl
Article
Part of the following topical collections:
  1. Special Issue on Scientific and Statistical Data Management

Abstract

Correlation clustering detects complex and intricate relationships in high-dimensional data by identifying groups of data points, each characterized by differents correlation among a (sub)set of features. Current correlation clustering methods generally limit themselves to linear correlations only. In this paper, we introduce a method for detecting global non-linear correlated clusters focusing on quadratic relations. We introduce a novel Hough transform for the detection of hyperparaboloids and apply it to the detection of hyperparaboloid correlated clusters in arbitrary high-dimensional data spaces. We further provide a solution for utilizing all available CPU cores on a system. For this we simply split the Hough space among a pre-defined axis into a number of equi-sized partitions. In this paper we show that this most simple way of parallelization already improves the runtime significantly. Non-linear correlation clustering like our method can reveal valuable insights which are not covered by current linear versions. Our empirical results on synthetic and real world data reveal that the proposed method is robust against noise, jitter and irregular densities.

Keywords

Data mining Non-linear correlation clustering Hough transform Multicore 

Notes

Supplementary material

References

  1. 1.
    Achtert, E., Böhm, C., Kröger, P., Zimek, A.: Mining hierarchies of correlation clusters. In: Proceedings of the International Conference on Scientific and Statistical Database Management, SSDBM, pp. 119–128 (2006)Google Scholar
  2. 2.
    Achtert, E., Böhm, C., Kriegel, H.P., Kröger, P., Zimek, A.: On exploring complex relationships of correlation clusters. In: Proceedings of the International Conference on Scientific and Statistical Database Management, SSDBM (2007a)Google Scholar
  3. 3.
    Achtert, E., Böhm, C., Kriegel, H.P., Kröger, P., Zimek, A.: Robust, complete, and efficient correlation clustering. In: Proceedings of the 2007 SIAM International Conference on Data Mining, pp. 413–418 (2007b)Google Scholar
  4. 4.
    Achtert, E., Böhm, C., David, J., Kröger, P., Zimek, A.: Global correlation clustering based on the Hough transform. Stat. Anal. Data Min. 1, 111–127 (2008)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Aggarwal, C.C., Yu, P.S.: Finding generalized projected clusters in high dimensional spaces. ACM SIGMOD Rec. 29(2), 70–81 (2000)CrossRefGoogle Scholar
  6. 6.
    Atkins, P., Depaula, J., Keeler, J.: Physical Chemistry. Oxford University Press, Oxford (2017)Google Scholar
  7. 7.
    Böhm, C., Kailing, K., Kröger, P., Zimek, A.: Computing clusters of correlation connected objects. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data—SIGMOD ’04, p. 455 (2004)Google Scholar
  8. 8.
    Duda, R.O., Hart, P.E.: Use of the Hough transform to detect lines and curves in pictures. Commun. ACM 15, 11–15 (1972)CrossRefzbMATHGoogle Scholar
  9. 9.
    Eitman, W.J., Guthrie, G.E.: The shape of the average cost curve. Am. Econ. Rev. 42(5), 832–838 (1952)Google Scholar
  10. 10.
    Kazempour, D., Mauder, M., Kröger, P., Seidl, T.: Detecting global hyperparaboloid correlated clusters based on Hough transform. In: Proceedings of the 29th International Conference on Scientific and Statistical Database Management, pp. 31:1–31:6 (2017)Google Scholar
  11. 11.
    MacQueen, J.B.: Kmeans some methods for classification and analysis of multivariate observations. 5th Berkeley Symp. Math. Stat. Probab. 1967 1(233), 281–297 (1967)MathSciNetGoogle Scholar
  12. 12.
    Sander, J., Ester, M., Kriegel, H.P., Xu, X.: Density-based clustering in spatial databases: the algorithm GDBSCAN and its applications. Data Min. Knowl. 194, 169–194 (1998)CrossRefGoogle Scholar
  13. 13.
    Schubert, E., Koos, A., Emrich, T., Züfle, A., Schmid, K.A., Zimek, A.: A framework for clustering uncertain data. PVLDB 8(12), 1976–1979 (2015)Google Scholar
  14. 14.
    Sha, C., Qiu, X., Zhou, A.: KLNCC: a new nonlinear correlation clustering algorithm based on KL-divergence. In: 8th IEEE International Conference on Computer and Information Technology, pp. 125–130 (2008)Google Scholar
  15. 15.
    Tung, A.K.H., Xu, X., Ooi, B.C.: CURLER: finding and visualizing nonlinear correlation clusters. In: SIGMOD ’05: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 467–478 (2005)Google Scholar
  16. 16.
    Zwietering, M., Jongenburger, I., Rombouts, F., Riet, K.V.: Modeling of the bacterial growth curve. J. Appl. Environ. Microbiol. 56(6), 1875–1881 (1990)Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  • Daniyal Kazempour
    • 1
  • Markus Mauder
    • 1
  • Peer Kröger
    • 1
  • Thomas Seidl
    • 1
  1. 1.Institut für Informatik - Lehrstuhl für Datenbanksysteme und Data MiningMunichGermany

Personalised recommendations