Chapter

Scientific and Statistical Database Management

Volume 6187 of the series Lecture Notes in Computer Science pp 482-500

Can Shared-Neighbor Distances Defeat the Curse of Dimensionality?

  • Michael E. HouleAffiliated withNational Institute of Informatics
  • , Hans-Peter KriegelAffiliated withLudwig-Maximilians-Universität München
  • , Peer KrögerAffiliated withLudwig-Maximilians-Universität München
  • , Erich SchubertAffiliated withLudwig-Maximilians-Universität München
  • , Arthur ZimekAffiliated withLudwig-Maximilians-Universität München

Abstract

The performance of similarity measures for search, indexing, and data mining applications tends to degrade rapidly as the dimensionality of the data increases. The effects of the so-called ‘curse of dimensionality’ have been studied by researchers for data sets generated according to a single data distribution. In this paper, we study the effects of this phenomenon on different similarity measures for multiply-distributed data. In particular, we assess the performance of shared-neighbor similarity measures, which are secondary similarity measures based on the rankings of data objects induced by some primary distance measure. We find that rank-based similarity measures can result in more stable performance than their associated primary distance measures.