Efficient Cluster Detection by Ordered Neighborhoods

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9263)

Abstract

Detecting cluster structures seems to be a simple task, i.e. separating similar from dissimilar objects. However, given today’s complex data, (dis-)similarity measures and traditional clustering algorithms are not reliable in separating clusters from each other. For example, when too many dimensions are considered simultaneously, objects become unique and (dis-)similarity does not provide meaningful information to detect clusters anymore. While the (dis-)similarity measures might be meaningful for individual dimensions, algorithms fail to combine this information for cluster detection. In particular, it is the severe issue of a combinatorial search space that results in inefficient algorithms.

In this paper we propose a cluster detection method based on the ordered neighborhoods. By considering such ordered neighborhoods in each dimension individually, we derive properties that allow us to detect clustered objects in dimensions in linear time. Our algorithm exploits the ordered neighborhoods in order to find both the similar objects and the dimensions in which these objects show high similarity. Evaluation results show that our method is scalable with both database size and dimensionality and enhances cluster detection w.r.t. state-of-the-art clustering techniques.

References

  1. 1.
    Aggarwal, C.C., Wolf, J.L., Yu, P.S., Procopiuc, C., Park, J.S.: Fast algorithms for projected clustering. SIGMOD Rec. 28(2), 61–72 (1999)CrossRefGoogle Scholar
  2. 2.
    Aksehirli, E., Goethals, B., Müller, E., Vreeken, J.: Cartification: a neighborhood preserving transformation for mining high dimensional data. In: ICDM, pp. 937–942, December 2013Google Scholar
  3. 3.
    Alon, U., Barkai, N., Notterman, D.A., Gish, K., Ybarra, S., Mack, D., Levine, A.J.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. PNAS 96(12), 6745–6750 (1999)CrossRefGoogle Scholar
  4. 4.
    Assent, I., Krieger, R., Müller, E., Seidl, T.: INSCY: indexing subspace clusters with in-process-removal of redundancy. In: ICDM, pp. 719–724, December 2008Google Scholar
  5. 5.
    Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is nearest neighbor meaningful? In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 217–235. Springer, Heidelberg (1998) CrossRefGoogle Scholar
  6. 6.
    Emrich, T., Kriegel, H.-P., Mamoulis, N., Niedermayer, J., Renz, M., Züfle, A.: Reverse-nearest neighbor queries on uncertain moving object trajectories. In: Bhowmick, S.S., Dyreson, C.E., Jensen, C.S., Lee, M.L., Muliantara, A., Thalheim, B. (eds.) DASFAA 2014, Part II. LNCS, vol. 8422, pp. 92–107. Springer, Heidelberg (2014) CrossRefGoogle Scholar
  7. 7.
    Goldstein, M.: \(k_n\)-nearest neighbor classification. IEEE TIT 18(5), 627–630 (1972)MATHGoogle Scholar
  8. 8.
    Jarvis, R., Patrick, E.: Clustering using a similarity measure based on shared near neighbors. IEEE TC C–22(11), 1025–1034 (1973)Google Scholar
  9. 9.
    Kailing, K., Kriegel, H.P., Kröger, P.: Density-connected subspace clustering for high-dimensional data. In: SDM, vol. 4. SIAM (2004)Google Scholar
  10. 10.
    Kriegel, H.P., Kröger, P., Renz, M., Wurst, S.: A generic framework for efficient subspace clustering of high-dimensional data. In: ICDM, pp. 8, November 2005Google Scholar
  11. 11.
    McCann, S., Lowe, D.: Local naive bayes nearest neighbor for image classification. In: CVPR, pp. 3650–3656, June 2012Google Scholar
  12. 12.
    Moise, G., Sander, J.: Finding non-redundant, statistically significant regions in high dimensional data: a novel approach to projected and subspace clustering. In: KDD, KDD 2008, pp. 533–541. ACM, New York (2008)Google Scholar
  13. 13.
    Müller, E., Assent, I., Günnemann, S., Krieger, R., Seidl, T.: Relevant subspace clustering: Mining the most interesting non-redundant concepts in high dimensional data. In: ICDM, pp. 377–386, December 2009Google Scholar
  14. 14.
    Müller, E., Günnemann, S., Assent, I., Seidl, T.: Evaluating clustering in subspace projections of high dimensional data. PVLDB 2(1), 1270–1281 (2009)Google Scholar
  15. 15.
    Nutt, C.L., Mani, D.R., Betensky, R.A., Tamayo, P., Cairncross, J.G., Ladd, C., Pohl, U., Hartmann, C., McLaughlin, M.E., Batchelor, T.T., Black, P.M., Deimling, A.V., Pomeroy, S.L., Golub, T.R., Louis, D.N.: Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. Cancer Res. 63(7), 1602–1607 (2003)Google Scholar
  16. 16.
    Park, Y., Park, S., Jung, W., Lee, S.G.: Reversed CF: a fast collaborative filtering algorithm using a k-nearest neighbor graph. Expert Syst. Appl. 42(8), 4022–4028 (2015)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Rodriguez, A., Laio, A.: Clustering by fast search and find of density peaks. Sci. 344(6191), 1492–1496 (2014)CrossRefGoogle Scholar
  18. 18.
    Schneider, J., Vlachos, M.: Fast parameterless density-based clustering via random projections. In: CIKM, CIKM 2013, pp. 861–866. ACM, New York (2013)Google Scholar
  19. 19.
    Sequeira, K., Zaki, M.: SCHISM: A new approach for interesting subspace mining. In: ICDM, vol. 0, pp. 186–193. IEEE Computer Society (2004)Google Scholar
  20. 20.
    Sim, K., Gopalkrishnan, V., Zimek, A., Cong, G.: A survey on enhanced subspace clustering. Data Min. Knowl. Disc. 26(2), 332–397 (2013)MathSciNetCrossRefMATHGoogle Scholar
  21. 21.
    Weinberger, K., Saul, L.: Unsupervised learning of image manifolds by semidefinite programming. Int. J. Comput. Vis. 70(1), 77–90 (2006)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Emin Aksehirli
    • 1
  • Bart Goethals
    • 1
  • Emmanuel Müller
    • 1
    • 2
  1. 1.University of AntwerpAntwerpBelgium
  2. 2.Karlsruhe Institute of TechnologyKarlsruheGermany

Personalised recommendations