Advertisement

Dimensionally Distributed Density Estimation

  • Pasi FräntiEmail author
  • Sami Sieranoja
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10842)

Abstract

Estimating density is needed in several clustering algorithms and other data analysis methods. Straightforward calculation takes O(N2) because of the calculation of all pairwise distances. This is the main bottleneck for making the algorithms scalable. We propose a faster O(N logN) time algorithm that calculates the density estimates in each dimension separately, and then simply cumulates the individual estimates into the final density values.

Keywords

Clustering Density estimation Density peaks K-means 

References

  1. 1.
    Astrahan, M.M.: Speech Analysis by Clustering, or the Hyperphome Method, Stanford Artificial Intelligence Project Memorandum AIM-124, Stanford University, Stanford, CA (1970)Google Scholar
  2. 2.
    Bai, L., Cheng, X., Liang, J., Shen, H., Guo, Y.: Fast density clustering strategies based on the k-means algorithm. Pattern Recognit. 71, 375–386 (2017)CrossRefGoogle Scholar
  3. 3.
    Cao, F., Liang, J., Bai, L.: A new initialization method for categorical data clustering. Expert Syst. App. 36(7), 10223–10228 (2009)CrossRefGoogle Scholar
  4. 4.
    Cao, F., Liang, J., Jiang, G.: An initialization method for the k-means algorithm using neighborhood model. Comput. Math. App. 58, 474–483 (2009)MathSciNetzbMATHGoogle Scholar
  5. 5.
    Denoeux, T., Kanhanatarakul, O., Sriboonchitta, S.: EK-NNclus: A clustering procedure based on the evidential K-nearest neighbor rule. Knowl.-Based Syst. 88, 57–69 (2015)CrossRefGoogle Scholar
  6. 6.
    Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: International Conference on Knowledge Discovery and Data Mining (KDD), pp. 226–231 (1996)Google Scholar
  7. 7.
    Fränti, P., Rezaei, M., Zhao, Q.: Centroid index: cluster level similarity measure. Pattern Recognit. 47(9), 3034–3045 (2014)CrossRefGoogle Scholar
  8. 8.
    Fränti, P., Virmajoki, O.: Iterative shrinking method for clustering problems. Pattern Recognit. 39(5), 761–765 (2006)CrossRefGoogle Scholar
  9. 9.
    Fränti, P., Virmajoki, O., Hautamäki, V.: Fast agglomerative clustering using a k-nearest neighbor graph. IEEE Trans. Pattern Anal. Mach. Intell. 28(11), 1875–1881 (2006)CrossRefGoogle Scholar
  10. 10.
    Gourgaris, P., Makris, C.: A density based k-means initialization scheme. In: EANN Workshops, Rhodes Island, Greece (2015)Google Scholar
  11. 11.
    Hautamäki, V., Kärkkäinen, I., Fränti, P.: Outlier detection using k-nearest neighbour graph. In: International Conference on Pattern Recognition (ICPR’2004), Cambridge, UK, pp. 430–433, August 2004Google Scholar
  12. 12.
    Hou, J., Pellilo, M.: A new density kernel in density peak based clustering. In: International Conference on Pattern Recognition, Cancun, Mexico, pp. 468–473, December 2014Google Scholar
  13. 13.
    Jain, A.K., Dubes, R.C.: Algorithms for clustering data. Prentice-Hall, Upper Saddle River (1988)zbMATHGoogle Scholar
  14. 14.
    Katsavounidis, I., Kuo, C.C.J., Zhang, Z.: A new initialization technique for generalized Lloyd iteration. IEEE Sig. Process. Lett. 1(10), 144–146 (1994)CrossRefGoogle Scholar
  15. 15.
    Knorr, E.M., Ng, R.T.: Algorithms for mining distance-based outliers in large datasets. In: International Conference on Very Large Data Bases, New York, USA, pp. 392–403 (1998)Google Scholar
  16. 16.
    Kärkkäinen, I., Fränti, P.: Dynamic local search algorithm for the clustering problem, Research Report A-2002-6Google Scholar
  17. 17.
    Lemke, O., Keller, B.: Common nearest neighbor clustering: why core sets matter. Algorithms (2018)Google Scholar
  18. 18.
    Lulli, A., Dell’Amico, M., Michiardi, P., Ricci, L.: NGDBSCAN: scalable density-based clustering for arbitrary data. VLDB Endow. 10(3), 157–168 (2016)CrossRefGoogle Scholar
  19. 19.
    Loftsgaarden, D.O., Quesenberry, C.P.: A nonparametric estimate of a multivariate density function. Ann. Math. Stat. 36(3), 1049–1051 (1965)MathSciNetCrossRefGoogle Scholar
  20. 20.
    Mak, K.F., He, K., Shan, J., Heinz, T.F.: Nat. Nanotechnol. 7, 494–498 (2012)CrossRefGoogle Scholar
  21. 21.
    Melnykov, I., Melnykov, V.: On k-means algorithm with the use of Mahalanobis distances. Stat. Probab. Lett. 84, 88–95 (2014)MathSciNetCrossRefGoogle Scholar
  22. 22.
    Mitra, P., Murthy, C.A., Pal, S.K.: Density-based multiscale data condensation. IEEE Trans. Pattern Anal. Mach. Intell. 24(6), 734–747 (2002)CrossRefGoogle Scholar
  23. 23.
    Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets. ACM SIGMOD Rec. 29(2), 427–438 (2000)CrossRefGoogle Scholar
  24. 24.
    Redmond, S.J., Heneghan, C.: A method for initialising the K-means clustering algorithm using kd-trees. Pattern Recognit. Lett. 28(8), 965–973 (2007)CrossRefGoogle Scholar
  25. 25.
    Rezaei, M., Fränti, P.: Set-matching methods for external cluster validity. IEEE Trans. Knowl. Data Eng. 28(8), 2173–2186 (2016)CrossRefGoogle Scholar
  26. 26.
    Rodriquez, A., Laio, A.: Clustering by fast search and find of density peaks. Science 344(6191), 1492–1496 (2014)CrossRefGoogle Scholar
  27. 27.
    Sieranoja, S., Fränti, P.: High-dimensional kNN-graph construction using z-order curve. ACM J. Exp. Algorithmics (submitted)Google Scholar
  28. 28.
    Steinley, D.: Initializing k-means batch clustering: a critical evaluation of several techniques. J. Classif. 24, 99–121 (2007)MathSciNetCrossRefGoogle Scholar
  29. 29.
    Steinwart, I.: Fully adaptive density-based clustering. Ann. Stat. 43(5), 2132–2167 (2015)MathSciNetCrossRefGoogle Scholar
  30. 30.
    Wang, Q., Kulkarni, R., Verdu, S.: Divergence estimation for multidimensional densities via k–nearest-neighbor distances. IEEE Trans. Inf. Theory 55(5), 2392–2405 (2009)MathSciNetCrossRefGoogle Scholar
  31. 31.
    Wang, J., Zhang, Y., Lan, X.: Automatic cluster number selection by finding density peaks. In: IEEE International Conference on Computers and Communications, Chengdu, China, October 2016Google Scholar
  32. 32.
    Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: a new data clustering algorithm and its applications. Data Min. Knowl. Discov. 1(2), 141–182 (1997)CrossRefGoogle Scholar
  33. 33.
    Zhao, Q., Fränti, P.: WB-index: a sum-of-squares based index for cluster validity. Data Knowl. Eng. 92, 77–89 (2014)CrossRefGoogle Scholar
  34. 34.
    Zhao, Q., Shi, Y., Liu, Q., Fränti, P.: A grid-growing clustering algorithm for geo-spatial data. Pattern Recogn. Lett. 53(1), 77–84 (2015)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.School of ComputingUniversity of Eastern FinlandJoensuuFinland

Personalised recommendations