Can Shared-Neighbor Distances Defeat the Curse of Dimensionality?

Houle, Michael E.; Kriegel, Hans-Peter; Kröger, Peer; Schubert, Erich; Zimek, Arthur

doi:10.1007/978-3-642-13818-8_34

Michael E. Houle¹⁸,
Hans-Peter Kriegel¹⁹,
Peer Kröger¹⁹,
Erich Schubert¹⁹ &
…
Arthur Zimek¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6187))

Included in the following conference series:

International Conference on Scientific and Statistical Database Management

2722 Accesses
120 Citations
3 Altmetric

Abstract

The performance of similarity measures for search, indexing, and data mining applications tends to degrade rapidly as the dimensionality of the data increases. The effects of the so-called ‘curse of dimensionality’ have been studied by researchers for data sets generated according to a single data distribution. In this paper, we study the effects of this phenomenon on different similarity measures for multiply-distributed data. In particular, we assess the performance of shared-neighbor similarity measures, which are secondary similarity measures based on the rankings of data objects induced by some primary distance measure. We find that rank-based similarity measures can result in more stable performance than their associated primary distance measures.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is “nearest neighbor” meaningful? In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 217–235. Springer, Heidelberg (1998)
Chapter Google Scholar
Hinneburg, A., Aggarwal, C.C., Keim, D.A.: What is the nearest neighbor in high dimensional spaces? In: Proc. VLDB (2000)
Google Scholar
Aggarwal, C.C., Hinneburg, A., Keim, D.: On the surprising behavior of distance metrics in high dimensional space. In: Van den Bussche, J., Vianu, V. (eds.) ICDT 2001. LNCS, vol. 1973, p. 420. Springer, Heidelberg (2000)
Chapter Google Scholar
Bennett, K.P., Fayyad, U., Geiger, D.: Density-based indexing for approximate nearest-neighbor queries. In: Proc. KDD (1999)
Google Scholar
Kriegel, H.P., Kröger, P., Zimek, A.: Clustering high dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM TKDD 3(1), 1–58 (2009)
Article Google Scholar
Aggarwal, C.C.: Re-designing distance functions and distance-based applications for high dimensional data. SIGMOD Record 30(1), 13–18 (2001)
Article Google Scholar
Domeniconi, C., Papadopoulos, D., Gunopulos, D., Ma, S.: Subspace clustering of high dimensional data. In: Proc. SDM (2004)
Google Scholar
Woo, K.G., Lee, J.H., Kim, M.H., Lee, Y.J.: FINDIT: a fast and intelligent subspace clustering algorithm using dimension voting. Inform. Software Technol. 46(4), 255–271 (2004)
Article Google Scholar
Parsons, L., Haque, E., Liu, H.: Subspace clustering for high dimensional data: A review. SIGKDD Explorations 6(1), 90–105 (2004)
Article Google Scholar
Yiu, M.L., Mamoulis, N.: Iterative projected clustering by subspace mining. IEEE TKDE 17(2), 176–189 (2005)
Google Scholar
Liu, G., Li, J., Sim, K., Wong, L.: Distance based subspace clustering with flexible dimension partitioning. In: Proc. ICDE (2007)
Google Scholar
Kriegel, H.P., Kröger, P., Schubert, E., Zimek, A.: A general framework for increasing the robustness of PCA-based correlation clustering algorithms. In: Ludäscher, B., Mamoulis, N. (eds.) SSDBM 2008. LNCS, vol. 5069, pp. 418–435. Springer, Heidelberg (2008)
Chapter Google Scholar
Moise, G., Sander, J.: Finding non-redundant, statistically significant regions in high dimensional data: a novel approach to projected and subspace clustering. In: Proc. KDD (2008)
Google Scholar
Achtert, E., Böhm, C., David, J., Kröger, P., Zimek, A.: Global correlation clustering based on the Hough transform. Stat. Anal. Data Min. 1(3), 111–127 (2008)
Article Google Scholar
Aggarwal, C.C., Yu, P.S.: Outlier detection for high dimensional data. In: Proc. SIGMOD (2001)
Google Scholar
Zhu, C., Kitagawa, H., Faloutsos, C.: Example-based robust outlier detection in high dimensional datasets. In: Proc. ICDM (2005)
Google Scholar
Kriegel, H.P., Schubert, M., Zimek, A.: Angle-based outlier detection in high-dimensional data. In: Proc. KDD (2008)
Google Scholar
Müller, E., Assent, I., Steinhausen, U., Seidl, T.: OutRank: ranking outliers in high dimensional data. In: Proc. ICDE Workshop DBRank (2008)
Google Scholar
Katayama, N., Satoh, S.: Distinctiveness-sensitive nearest-neighbor search for efficient similarity retrieval of multimedia information. In: Proc. ICDE (2001)
Google Scholar
Aggarwal, C.C., Yu, P.S.: Finding generalized projected clusters in high dimensional space. In: Proc. SIGMOD (2000)
Google Scholar
Berchtold, S., Böhm, C., Jagadish, H.V., Kriegel, H.P., Sander, J.: Independent Quantization: An index compression technique for high-dimensional data spaces. In: Proc. ICDE (2000)
Google Scholar
Jin, H., Ooi, B.C., Shen, H.T., Yu, C., Zhou, A.Y.: An adaptive and efficient dimensionality reduction algorithm for high-dimensional indexing. In: Proc. ICDE (2003)
Google Scholar
Aggarwal, C.C., Yu, P.S.: On high dimensional indexing of uncertain data. In: Proc. ICDE (2008)
Google Scholar
Francois, D., Wertz, V., Verleysen, M.: The concentration of fractional distances. IEEE TKDE 19(7), 873–886 (2007)
Google Scholar
Ertöz, L., Steinbach, M., Kumar, V.: Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In: Proc. SDM (2003)
Google Scholar
Houle, M.E.: Navigating massive data sets via local clustering. In: Proc. KDD (2003)
Google Scholar
Guha, S., Rastogi, R., Shim, K.: CURE: An efficient clustering algorithm for large databases. In: Proc. SIGMOD, pp. 73–84 (1998)
Google Scholar
Jarvis, R.A., Patrick, E.A.: Clustering using a similarity measure based on shared near neighbors. IEEE TC C-22(11), 1025–1034 (1973)
Google Scholar
Houle, M.E.: The relevant-set correlation model for data clustering. Stat. Anal. Data Min. 1(3), 157–176 (2008)
Article Google Scholar
Kriegel, H.P., Kröger, P., Schubert, E., Zimek, A.: Outlier detection in axis-parallel subspaces of high dimensional data. In: Proc. PAKDD (2009)
Google Scholar
Faloutsos, C., Kamel, I.: Beyond uniformity and independence: Analysis of R-trees using the concept of fractal dimension. In: Proc. SIGMOD (1994)
Google Scholar
Belussi, A., Faloutsos, C.: Estimating the selectivity of spatial queries using the ‘correlation’ fractal dimension. In: Proc. VLDB (1995)
Google Scholar
Pagel, B.U., Korn, F., Faloutsos, C.: Deflating the dimensionality curse using multiple fractal dimensions. In: Proc. ICDE (2000)
Google Scholar
Korn, F., Pagel, B.U., Falutsos, C.: On the “dimensionality curse” and the “self-similarity blessing”. IEEE TKDE 13(1), 96–111 (2001)
Google Scholar
Asuncion, A., Newman, D.J.: UCI Machine Learning Repository (2007), http://www.ics.uci.edu/~mlearn/MLRepository.html
Geusebroek, J.M., Burghouts, G.J., Smeulders, A.: The Amsterdam Library of Object Images. Int. J. Computer Vision 61(1), 103–112 (2005)
Article Google Scholar
Boujemaa, N., Fauqueur, J., Ferecatu, M., Fleuret, F., Gouet, V., Saux, B.L., Sahbi, H.: IKONA: Interactive generic and specific image retrieval. In: Proc. MMCBIR, pp. 25–28 (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

National Institute of Informatics, 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo, 101-8430, Japan
Michael E. Houle
Ludwig-Maximilians-Universität München, Oettingenstr. 67, 80538, München, Germany
Hans-Peter Kriegel, Peer Kröger, Erich Schubert & Arthur Zimek

Authors

Michael E. Houle
View author publications
You can also search for this author in PubMed Google Scholar
Hans-Peter Kriegel
View author publications
You can also search for this author in PubMed Google Scholar
Peer Kröger
View author publications
You can also search for this author in PubMed Google Scholar
Erich Schubert
View author publications
You can also search for this author in PubMed Google Scholar
Arthur Zimek
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computer Science, University of Heidelberg, 69120, Heidelberg, Germany
Michael Gertz
Dept. of Computer Science and Genome Center, University of California, One Shields Avenue, 95616, Davis, CA, USA
Bertram Ludäscher

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Houle, M.E., Kriegel, HP., Kröger, P., Schubert, E., Zimek, A. (2010). Can Shared-Neighbor Distances Defeat the Curse of Dimensionality?. In: Gertz, M., Ludäscher, B. (eds) Scientific and Statistical Database Management. SSDBM 2010. Lecture Notes in Computer Science, vol 6187. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13818-8_34

Download citation

DOI: https://doi.org/10.1007/978-3-642-13818-8_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13817-1
Online ISBN: 978-3-642-13818-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics